ChatGPT as a search engine sounds amazing but also really quite problematic. It's an extension of the issue with Google's "instant answers" (or whatever they're called): right now the creators of the content Google/ChatGPT scrapes are usually paid via the advertisements on their pages. When no-one clicks though any more, no-one gets paid.
I know, I know, the ad banner-funded web is a mess and I wouldn't mourn its demise either. But it worries me that it's an entirely open ended question for what actually replaces it.
It builds a paragraph answering your query but it has a lot of footnotes that link directly to websites.
Ex:
"geopolitical reason for palm oil being banned and why it's bad for health"
The EU has banned palm oil in biofuels due to its negative impacts on health[1] and its geopolitical implications, such as favoring alternative crops grown in Europe[2]. Palm plantations are also a major factor of deforestation[3], leading to the loss of habitat for endangered species[4]. Indonesia's President Joko Widodo recently announced a ban on the export of palm oil, which could backfire due to its importance in the global market[5].
just wondering aloud, but say like this eats google's lunch and search goes away. These chatbots provide answers, and footnotes. In a world like this, where the need is reduced to actually go to a website, what's the incentive for maintaining a website that no one visits? After a couple rounds of "adversarial training data evolution," that is, people optimizing their websites to get mentions by chatbots, will the bots really be better than google? After a couple routes of revenue improvements, will the chatbot responses remain good? And if folks quit maintaining websites, do the chatbots get a bit stuck in time from when their training data was last good?
I feel like there is something here, but I wonder if second order effects make it trickier than it appears atm
It's the dying out of the "middle" of the value curve. You end up with a market that's dominated by cheap "good enough" products for casuals and "products built to last" for power users.
So if you have a low effort website that's factual and text based, you're going to get your lunch eaten by GPT, if you have a higher effort website (subscription gated with lots of multimedia content and user engagement) you'll be fine.
Think of all the blogspam recipe sites that are going to run into trouble when ChatGPT learns to cook well. Lots of text, little additional value, no community. There still will be America's Test Kitchen because people on the upper end of the value curve don't just want a recipe, they want pictures + video of that recipe being made and a place where they can ask questions and get answers.
A link is useless if you can't verify it. Just linking to the "nih.gov" homepage is meaningless, I have no ability to click through and verify that "ah yes, this page does demonstrate the claim being made." The system could just link to arbitrary homepages of respected institutions and claim that they back up whatever is being said.
The displayed text is the home site, but the link itself is to a relevant specific page. Also clicking "view list" expands and shows excerpts from each page.
Apologies, I had a hard time copy/pasting the footnotes as on the website they are css styled boxes with a thumbnail. This is only the label, the link itself brings the the relevant pages.
I only meant about the use of footnotes links. The AI themselves are quite bad but can also be useful if you learn not to take everything at face value. One example of it failing is when I search for my own name it mixes up a bunch of people together so I end up being a dead murderer from the 70's, a house rental person, a developer and also a pianist.
Footnote links still don't solve the financial model, primarily because nobody clicks on them.
There was congressional testimony by the founder/owner of "Celebrity Net Worth" about how Google made it impossible for them to stay in business. Whenever somebody would search "How much is <celebrity X> worth?", the answer would just show up directly on the Google results page. There was still an attribution link to Celebrity Net Worth, but nobody ever clicked on it anymore, so the result was Celebrity Net Worth had to shut down.
You can certainly argue fairly whether sites like CNW deserve to exist in the first place, but it's not hard to see how there is still a huge financial problem when ALL the ad revenue goes to the search engines and they don't even leave any of the slim scraps to the publisher sites.
The same reason there are sources in the footnotes on Wikipedia: so you can double check or read further.
I like to think of ChatGPT and the like as an on-demand personalized Wikipedia: a good starting point, comes with strings attached, not always correct (but some are fine with it).
ChatGPT doesn't link sources yet but I saw that the beta test context search from Kagi had them.
In general, generative models are taking publicly available content from creators and are monetizing them for big tech.
Same with stable diffusion, AI art. And it'll be the same with LLMs.
Eventually the whole internet would be flooded with cheap AI generated content and clearly AIs need human generated content to train on so it'll be the snake eats itself.
>Eventually the whole internet would be flooded with cheap AI generated content and clearly AIs need human generated content to train on so it'll be the snake eats itself.
Geniune human-generated content will retreat to account-gated networks and private group chats where everyone knows one another. The rest of the internet will just be incestuous AI-generated chum.
In a way, it'd be an improvement. Genuine connection doesn't scale, so let's be honest about keeping it away from random parasites online.
The Web never really replaced books as information sources for important things. The way to access most of the best stuff available via the Web is to... download ebooks, thus leaving the Web, if not just the "buy" button for real, physical books.
The single most intellectually valuable website on the entire Web is very likely Library Genesis, where the only Web content is a catalog of books you can pirate by clicking a link, and it's, like, a lot more valuable than any other site (even Wikipedia). It may well be more valuable, in those terms, than the entire rest of the Web combined.
If serious book publishers survive a while longer and if non-fiction books aren't overrun with dubiously-accurate AI bullshit, things won't actually change all that much, I think.
As far as written content goes, the (public) Web is most useful for opinions or product discovery, and even those can already hardly be trusted because of all the marketing astroturfing. AI garbage barely changes that already-toxic dynamic.
Video's another matter—some video content on the Web is great and has ~no at-least-as-good replacement anywhere else, in any other medium. But it's also all but completely monopolized by Youtube and hardly participates in or factors into the broader Web.
That’s not so different from the current state of the world in which many search results are full of cheap SEO content. Now instead of appending Reddit to my search I just go to chatGPT. Probably similar levels of accuracy between Reddit and GPT anyway while the SEO content is just garbage filler.
> clearly AIs need human generated content to train on so it'll be the snake eats itself.
For the time being but there is no reason AI couldn't produce genuine original content. Real life human artists also use previous content for inspiration.
Human art isn't only previous content though. It's filtered through the experiences of the artist themselves. Eric Clapton surely took inspiration from prior music, but "Tears in Heaven" is about a uniquely personal and human experience of losing a child. If a computer had written it, it wouldn't have the same artistic or emotional weight.
There are parallels in manufacturing and product. People who sell with high quality control, better methods. And everything else gets cheaper more common and lower (good enough) quality.
The question for what actually replaces the WWW as we currently know it isn't really open-ended. Many of the people who post on HN and similar sites remember what the web was like before it became an ad-funded mess.
I would enthusiastically welcome a web that isn't based on firehose-advertising and outright deception/lies.
> I would enthusiastically welcome a web that isn't based on firehose-advertising and outright deception/lies.
I would too but I don't see how it happens. The awesome web that used to be was built on the backs of unpaid volunteers. Maybe that could return but even if it did all that wonderful volunteer work would get funneled through Google or OpenAI so investors can make a fat profit from it. Feels fundamentally wrong to me.
Nadella was asked about this. He basically said that they have a responsibility to drive users to sites with click through or else sites won't have an incentive to be crawled.
> He basically said that they have a responsibility to drive users to sites with click through or else sites won't have an incentive to be crawled.
This is partially a BS answer. As long as websites are running Google Ads, they will have an incentive to be crawled. Fewer clicks > no clicks (which is what would happen if the site was set to 'noindex'.
Google also pays news publishers to license their content; $1bn alone just for Google News Showcase [0]
At least the Prometheus model used by Bing is aware of its sources and displays them as links below the result. That could help driving traffic and monetisation. This is something that Google Bard seemed to lack for now.
Google does not pay site owners and I thought the big complaint was that users rarely if ever click through. The instant answers keep users and their potential page impressions in Google-land.
Further the users were really only paid if they put Google's ads up on their website to further monetize it or if they had a paywall which meant that their information was much less likely to show up in Google. So Google has only for a short time really paid people back when ads were actually relatively lucrative. Or if you were essentially selling something you could instantly monetize through that sale that Google pointed people too but for people just producing content it's never been that great of a way to reward them for them aggregating their information and selling it to other people.
I know, I know, the ad banner-funded web is a mess and I wouldn't mourn its demise either. But it worries me that it's an entirely open ended question for what actually replaces it.