Interesting approach. The scraper-vs-site-owner arms race is real.
On the flip side of this discussion - if you're building a scraper yourself, there are ways to be less annoying:
1. Run locally instead of from cloud servers. Most aggressive blocking targets VPS IPs. A desktop app using the user's home IP looks like normal browsing.
2. Respect rate limits and add delays. Obvious but often ignored.
3. Use RSS feeds when available - many sites leave them open even when blocking scrapers.
I built a Reddit data tool (search "reddit wappkit" if curious) and the "local IP" approach basically eliminated all blocking issues. Reddit is pretty aggressive against server IPs but doesn't bother home connections.
The porn-link solution is creative though. Fight absurdity with absurdity I guess.
I think "scraper vs siteowners" is a false dichotomy. Scrapers will always need to exist as long as we want search engines and archival services. We will need small versions of these services to keep popping up every now and then to keep the big guys on their toes, and the smaller guys need advice for scraping politely.
That's fair - though are we in an isolated bout of "every now and then" or has AI created a new normal of abuse (e.g. of robots.txt)? Hopefully we're at a local maximum and some of the scrapers perpetrating harmful behaviours will soon pull their heads in.
On the flip side of this discussion - if you're building a scraper yourself, there are ways to be less annoying:
1. Run locally instead of from cloud servers. Most aggressive blocking targets VPS IPs. A desktop app using the user's home IP looks like normal browsing.
2. Respect rate limits and add delays. Obvious but often ignored.
3. Use RSS feeds when available - many sites leave them open even when blocking scrapers.
I built a Reddit data tool (search "reddit wappkit" if curious) and the "local IP" approach basically eliminated all blocking issues. Reddit is pretty aggressive against server IPs but doesn't bother home connections.
The porn-link solution is creative though. Fight absurdity with absurdity I guess.