Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've still never understood the complaint here. Robots are part of the web, the whole point of HTML is that robots can read it.


The whole point of publication is so that humans can read it. Robots not so much, especially if they're not paying customers. This is the distinction between how the web works technically and how it works socioeconomically.

This is the next iteration of things like the news snippet case. Publishers are not happy that Google crawls their content (at their expense) and then republishes it on their own site, while serving ads around it and getting user data, without cutting in the publisher who originally made it. And, for what little it's worth, owns the copyright.


Web 3.0 is all about machine-readability: https://en.m.wikipedia.org/wiki/Semantic_Web

(Not to be confused with Web3.)


Robots don't exist for their own sake, at the end of the day they are user agents for some group of humans.

Again it sounds like the people who are upset by this really want to publish images rather than web pages.


> Again it sounds like the people who are upset by this really want to publish images rather than web pages.

More like people don't want to lose money because a 3rd party stole all of their content, and then repurposed it to show people before they visit their website.


It's a cost thing. It costs more to render a website than it does to consume it. When you have some bot traffic mixed in with human traffic, that is fine.

When you have egregious bot traffic, say 10k requests per minute sustained load, it becomes a real problem for webmasters.


Having perplexity or other AI bots go haywire and sending 10s of thousands of requests per minute to your website (despite you having a robots.txt blocking them) is a giant pain in the ass. Not only does your server costs go up, but your analytics and attribution reports start to look messed up because of all the bot traffic.


Yeah obviously if they're being abusive that's a problem but that's not what the article seems to be talking about.


Well besides them being abusive, the other issue is that AI overviews and answer boxes cannibalize traffic to websites, leading to less conversions for the original content producers. This is pretty well established across industries at this point:

https://ahrefs.com/blog/ai-overviews-reduce-clicks/


That's how people want to browse the web. If you block it you won't even get links from those. That's like blocking the search crawler.


...that's literally the entire point of this article. People don't want their websites being de-listed from the monopoly that controls organic traffic. At the same time they would like some control over stopping companies (in this case, the same company that controls the organic search monopoly) from scraping and repurposing their content so their the traffic to their website doesn't decrease.

Why is it such an issue that publishers and website owners want to maintain the traffic to their website so that they can continue operating as usual? Or should we all just accept every Google decision, even when those decisions result in more engagement on google.com, but 20-35% decreases in traffic to the original websites?

Also I'm going to need a citation that the vast majority of people want and get value out of AI overviews. Because that is certainly not the case from my experience.


Google absolutely is not the only company doing this and if they didn't do it I'd feed the results into my local models to get the same thing.

This isn't a "google decision" people are changing the way they use the web.


Google has clearly decided to keep users on their platform longer, hoping that this will lead to more ad clicks. There is a clear reason why AI overviews very seldomly link to outside websites, and why website links are much more hidden on Google Maps/Business Profiles. More time spent on the Google platform means more likely that someone will eventually click an ad.

Also - I noticed a pretty huge outcry when AI overviews were introduced to search. Can you show me all the people who enjoy the experience of using them more than not?


The AI tools I use generate numerous links, usually 5~10. I'd be annoyed if they didn't.

> Can you show me all the people who enjoy the experience of using them more than not?

Yes. Most people here have commented that they prefer AI responses to raw search results partly because they don't have to deal with poorly written web sites. Most people I know IRL do too.


> Most people here have commented that they prefer AI responses to raw search results

Strangely enough, I've seen the exact opposite response on here. Especially since the AI overviews are often plain wrong and/or misleading. Many others like myself also prefer to get information directly from 1st parties, rather than whatever sausage has been produced through the black box information meat grinder we call AI.


Because you remain at the top of an extremely shallow perspective. The fact that robots are a part is not relevant, the relevant issue is how some of those robots behave, and what the consequences of that behavior are.


Have you ever been responsible for the performance and security of a publicly accessible web server? I'll accept robots indexing my content if they play nicely. Unfortunately most do not, even from major vendors.


Not a web server but yeah we dealt with it by black listing patterns (IPs, requests etc) from misbehaving domains.

We never distinguished automations from people though, that makes no sense on the internet.


> We never distinguished automations from people though, that makes no sense on the internet.

LOL I see you've never sold anything on the internet, ran a website that is supposed to generate leads, or had to gauge the effectiveness of an ad campaign. There is a huge part of the internet that relies on real humans doing things on websites. And ignoring that is insane.


Maybe I should clarify: It makes no sense from a system architecture perspective. Obviously if you're doing analytics you want to know the difference.


A systems architecture perspective should be very very inclusive of the business perspective.


In theory.

In practice business people have such a poor intuition for systems design you end up with borderline unusable software when you do that.


that's why systems architects are supposed to get paid well, because there are a lot of different stakeholders to consider


HTML was not made for robots to read (a semantic web or an internet of data), it just so happens that crawlers try to index things in meaningful ways. It's an un-ordered blob of unstructured data.


That's why we need the Captchaweb. It's like the web, but everything is in captcha text.


Machine-readable does not mean centralized.


A lot of people don't want AI slop and don't want the companies pushing it to crawl their websites.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: