Have you tried Anubis or similar tools? I've had similar issues with bot scraping of a forum taking all server resources, and using PoW challenge solved the problem.
I've always wondered: has there been any effort to implement a PoW challenge like that at a lower level? I.e., TCP but the handshake requires solving a challenge, otherwise the connection is just closed? It seems like something that could benefit from being invisible on the application layer.
I wrote the below to explain to our users what was happening, so apologies if the language is too simple for a HN reader.
- 0630, we switched our DNS to proxy through CF, starting the collection of data, and implemented basic bot protections
- Unfortunately whatever anti-bot magic they have isn't quite having the effect, even after two hours.
- 0830, I sign in and take a look at the analytics. It seems like <SITE NAME> is very popular in Vietnam, Brazil, and Indonesia.
- 0845, I make it so users from those countries have to pass a CF "challenge". This is similar to a CAPTCHA, but CF try to make it so there's no "choosing all the cars in an image" if they can help it.
- So far 0% of our Asian audience have passed a challenge.
I run a small video game forum with posts going back to 2008. We got absolutely smashed by bots scraping for training data for LLMs.
So I put it behind Cloudflare and now it's down. Ho hum.