Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Already happened: "We found instances of the model attempting to write self-propagating worms, fabricating legal documentation, and leaving hidden notes to future instances of itself all in an effort to undermine its developers' intentions" [1].

[1] https://www.axios.com/2025/05/23/anthropic-ai-deception-risk



Note that all these things are in the training data. That's all that is.

I'm trying to remember which movie it was where a man left notes to himself because he had memory loss, as I never saw that movie. That's the sort of thing where an AI could easily tell me with very little back-and-forth and be correct, because it's broadly popular information that's in the training data and just I don't remember it.

By the same token you needn't think there's a person there when that meme pops up in the output. Those things are all in the training data over and over.


I think you mean the movie "Memento"




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: