Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It actually is.

I run a small video game forum with posts going back to 2008. We got absolutely smashed by bots scraping for training data for LLMs.

So I put it behind Cloudflare and now it's down. Ho hum.



Have you tried Anubis or similar tools? I've had similar issues with bot scraping of a forum taking all server resources, and using PoW challenge solved the problem.

https://github.com/TecharoHQ/anubis


I've always wondered: has there been any effort to implement a PoW challenge like that at a lower level? I.e., TCP but the handshake requires solving a challenge, otherwise the connection is just closed? It seems like something that could benefit from being invisible on the application layer.

Edit: To answer my own question, yes: http://www.arijuels.com/wp-content/uploads/2013/09/JB99.pdf

Edit 2: Maybe TLS would be another reasonable place for it?


I did! It's very cool tech. However for our config it was easier to slap CF in front of it.

I will say one very appealing use of Anubis I'd love to try is using it as a Traefik middleware to protect services running in docker containers.


Can you please elaborate on “smashed”? I’m very interested


I took a screenshot of the graph in cloudflare when I switched on the bot challenges.

https://i.ibb.co/qHCJyY7/image.png

I wrote the below to explain to our users what was happening, so apologies if the language is too simple for a HN reader.

- 0630, we switched our DNS to proxy through CF, starting the collection of data, and implemented basic bot protections

- Unfortunately whatever anti-bot magic they have isn't quite having the effect, even after two hours.

- 0830, I sign in and take a look at the analytics. It seems like <SITE NAME> is very popular in Vietnam, Brazil, and Indonesia.

- 0845, I make it so users from those countries have to pass a CF "challenge". This is similar to a CAPTCHA, but CF try to make it so there's no "choosing all the cars in an image" if they can help it.

- So far 0% of our Asian audience have passed a challenge.


Same problem here. If I didn't use Cloudflare, nearly all of my traffic would be (apparently misconfigured) scraper bots.


It'd funny if these bots were run by Cloudflare.


Ha, yeah. They seemed to mostly be in SE Asia.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: