Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> We'll be able to buy HN comments by the thousand -- expertly wordsmithed, lucid AI comments

You're forgetting the millions of additional comments that will be written by humans to trick the AI into promoting their content.

Even worse, currently if you ask Chat GPT to write you some code, it will make up an API endpoint that doesn't exist and then make up a URL that doesn't exist where you can register for an API key. People are already registering these domains, and parking fake sites on them to scam people. ChatGPT is creating a huge market for creating fake companies to match the fake information it's generating.

The biggest risk may not be people using AI-generated comments to promote their own repos, but rather registering new repos to match the fake ones that the AI is already promoting.



> ChatGPT is creating a huge market for creating fake companies to match the fake information it's generating.

Does ChatGPT consistently generate the same fake data though?


There was one company that had to put up a “our API can’t get location data from a phone number so stop asking, GPT lied” page.


I have noticed that ChatGPT will give me a consistent output when the input is identical, but I haven’t done extensive research on this.


I'm constantly curious whether anyone working in the AI space is cognizant of the Tower of Babel myth.

I don't think an arms race for convincing looking bullshit is going to turn out well for our species.


I feel like you’re overstating this as a long term issue. sure it’s a problem now, but realistically how long before code hallucinations are patched out?


The black box nature of the model means this isn't something you can really 'patch out'. It's a byproduct of the way the system processes data - they'll get less frequent with targeted fine tuning and improved model power, but there's no easy solve.


this is clearly untrue. it’s an input, a black box, then an output. openai have 100% control over the output. they may not be able to directly control what comes out of the black box, but a) they can tune the model, and they undoubtedly will, and b) they can control what comes after the black box. they can—for example—simply block urls


This is true, but detecting and omitting code hallucinations is (functionally) as hard as just not hallucinating in the first place.


They don’t have control over the output. They created something that creates something else. They can only tweak what they created, not whatever was created by what they created.

E.g., if I create a great paintbrush which creates amazing spatter designs on the wall when it is used just so, then, beyond a point, I have no way to control the spatter designs - I can only influence the designs to some extent.


did you read what I said?


Assuming those hallucinations are a thing to be patched out and not the core part of a system that works by essentially sampling a probability distribution for the most likely following word.


evidently, they can hard-code exceptions into it. this idea that it's entirely a black box that they have no control over is really strange and incorrect and feels to me like little more than contrarianism to my comment


An aside: what do people mean when they say “hallucinations” generally? Is it something more refined than just “wrong”?

As far as I can tell most people just use it as a shorthand for “wow that was weird” but there’s no difference as far as the model is concerned?


Wrong is saying 2+2 is five.

Wrong is saying that the sun rises in the west.

By hallucinating they’re trying to imply that it didn’t just get something wrong but instead dreamed up an alternate world where what you want existed, and then described that.

Or another way to look at it, it gave an answer that looks right enough that you can’t immediately tell it is wrong.


this isn't a good explanation. these LLMs are essentially statistical models. when they "hallucinate", they're not "imagining" or "dreaming", they're simply producing a string of results that your prompt combined with its training corpus implies to be likely


Most people don’t understand the technology and maths at play in these systems. That’s normal, as is using familiar words that make that feel less awful. If you have a genuine interest in understanding how and why errant generated content emerges, it will take some study. There isn’t (in my opinion) a quick helpful answer.


I genuinely want to understand whether there’s a meaningful difference between non-hallucinatory and hallucinatory content generation other than “real world correctness”.


I’m far from an expert but as I understand it the reference point isn’t so much the “real world” as it is the training data. If the model generates a strongly weighted association that isn’t in the data, and shouldn’t exist perhaps at all. I’d prefer a word like “superstition”, it seems more relatable.


Nobody knows.


undoubtedly not long


Folks, doesn't it seem a little harsh to pile downvotes onto this comment? It's an interesting objection stimulating meaningful conversation for us all to learn from.

If you disagree or have proof of the opposite, just say so and don't vote up. There's no reason to get so emotional we also try to hide it from the community by spamming it down into oblivion.


to be fair, it’s only one net downvote




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: