> I don't understand how AI scrapers make up such a large percentage of traffic ...

marginalia_nu · 2025-07-08T16:12:59 1751991179

Yeah it seems the implementation of these web-aware GPT queries lacks a(n adequate) caching layer.

Could also be framed as an API issue, as there is no technical limitations why search provider couldn't provide relevant snapshots of the body of the search results. Then again, might be legal issues behind not providing that information.

NitpickLawyer · 2025-07-08T17:00:07 1751994007

Caching on client-side is an obvious improvement, but probably not trivial to implement at provider-level (what do you cache, are you allowed to?, how do you deal with auth tokens (if supported), when searching a small difference might invalidate cache, and so on).

Another content-creator avenue might be to move to a 2-tier content serving, where you serve pure html as a public interface, and only allow "advanced" features that take many cpu cycles for authenticated / paying users. It suddenly doesn't make sense to use a huge, heavy and resource intensive framework for things that might be crawled a lot by bots / users doing queries w/ LLMs.

Another idea was recently discussed here, and covers "micropayments" for access to content. Probably not trivial to implement either, even though it sounds easy in theory. We've had an entire web3.0 hype cycle on this, and yet no clear easy solutions for micropayments... Oh well. Web4.0 it is :)

CaptainFever · 2025-07-08T16:54:44 1751993684

A caching layer sounds wonderful. Improves reliabiltity while reducing load on the original servers.

I worry that such caching layers might run afoul of copyright, though :(

Though an internal caching layer would work, surely?