Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Are scrapers written on a per-website basis? Are there techniques to separate content from menus / ads / filler / additional information, etc? How do people deal with design changes - is it by rewriting the scraper whenever this happens? Thanks!


Yeah. I managed to abstract a bit the structure but in the end websites change.


Yeah it’s often gonna be a per site, lots of xpath queries, email me when it breaks kind of endeavor.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: