Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Whenever I work with large datasets I use a small subset of the overall data to do testing while I build the pipeline, this avoids long run times and allows for quick iteration while I get things set up to run against the full dataset


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: