I am with you in that this rhetoric is getting exhausting.
In this particular case though I don't think "evil” is a moral claim, more shorthand for cost externalizing behavior. Hammering expensive dynamic endpoints with millions of unique requests isn’t neutral automation, it's degrading a shared public resource. Call it evil, antisocial, or extractive, the outcome is the same.
Sounds like you have zero empathy for the real costs AI is driving and feelings that this creates for website owners. How about you pony up and pay for your scraping?
> To archive Metabrainz there is no way but to browse the pages slowly page-by-page. There's no machine-communicable way that suggests an alternative.
Why does there have to be a "machine-communicable way"? If these developers cared about such things they would spend 20 seconds looking at this page. It's literally one of the first links when you Google "metabrainz"
You expect the developers of a crawler to look at every site they crawl and develop a specialized crawler for them? That’s fine if you’re only crawling a handful of sites, but absolutely insane if you’re crawling the entire web.
I'm not entirely sure why people think more standards are the way forward. The scrapers apparently don't listen to the already-established standards. What makes one think they would suddenly start if we add another one or two?
There is no standard, well-known way for a website to advertise, "hey, here's a cached data dump for bulk download, please use that instead of bulk scraping". If they were, I'd expect the major AI companies and other users[0] to use that method for gathering training data[1]. They have compelling reasons to: it's cheaper for them, and cultivates goodwill instead of burning it.
This also means that right now, it could be much easier to push through such standard than ever before: there are big players who would actually be receptive to it, so even few not-entirely-selfish actors agreeing on it might just do the trick.
--
[0] - Plenty of them exist. Scrapping wasn't popularized by AI companies, it's standard practice of on-line business in competitive markets. It's the digital equivalent of sending your employees to competing stores undercover.
[1] - Not to be confused with having an LLM scrap specific page for some user because the user requested it. That IMO is a totally legitimate and unfairly penalized/villified use case, because LLM is acting for the user - i.e. it becomes a literal user agent, in the same sense that web browser is (this is the meaning behind the name of "User-Agent" header).
You do realize that these AI scrapers are most likely written by people who have no idea what they're doing right? Or they just don't care? If they were, pretty much none of the problems these things have caused would exist. Even if we did standardize such a thing, I doubt they would follow it. After all, they think they and everyone else has infinite resources so they can just hammer websites forever.
I realise you are making assertions for which you have no evidence. Until a standard exists we can't just assume nobody will use it, particularly when it makes the very task they are scraping for simpler and more efficient.
> The /metadata/lookup API endpoints (GET and POST versions) now require the caller to send an Authorization token in order for this endpoint to work.
> The ListenBrainz Labs API endpoints for mbid-mapping, mbid-mapping-release and mbid-mapping-explain have been removed. Those were always intended for debugging purposes and will also soon be replaced with a new endpoints for our upcoming improved mapper.
> LB Radio will now require users to be logged in to use it (and API endpoint users will need to send the Authorization header). The error message for logged in users is a bit clunky at the moment; we’ll fix this once we’ve finished the work for this year’s Year in Music.
Seems reasonable and no big deal at all. I'm not entirely sure what "nice things" we can't have because of this. Unauthenticated APIs?
A: PawSense detects the paws of even deaf cats. Even if a cat is deaf, PawSense blocks cat typing once detected. This makes it harder for the cat to mess up your programs, data files, and operating system.
However, PawSense does not include a miracle cure for deafness.
Pumping even more CO2 into the air hoping the magic box spits out a solution to remove the CO2 from the air doesn't seem like a sustainable recipe either.
You might have got some at a rehab centre, or someone might live with a non-addict friend or partner. Community outreach workers (in cities that have embraced this stuff) might carry some around to administer.
I would be surprised if widespread availability to Narcan didn't decrease ODs.
And here I was thinking that GregTech's "Ludicrous Voltage" sounded out of place...
reply