Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It was meant as more of an illustration than a persuasive argument. LLMs don't have much of a distinction between thinking and writing/saying. For a human, an admonition to not say something would be obeyed as a filter on top of thoughts. (Well, not just a filter, but close enough.) Adjusting outputs via training or reinforcement learning applies more to the LLM's "thought process". LLMs != humans, but "a human thinking" is the closest regular world analogy I can come up with to an LLM processing. "A human speaking" is further away. The thing in between thoughts and speech involves human reasoning, human rules, human morality, etc.

As a result, I'm going to take your "...so it is by definition impossible to _not_ think about a word we must avoid" as agreeing with me. ;-)

Different things are different, of course, so none of this lines up or fails to line up where we might think or expect. Anthropic's exploration into the inner workings of an LLM revealed that if you give them an instruction to avoid something, they'll start out doing it anyway and only later start obeying the instruction. It takes some time to make its way through, I guess?



Consider, too: tokens and math. As much as I like to avoid responsibility, I still pay taxes. The payment network or complexity of the world kind of forces the issue.

Things have already been tokenized and 'ideas' set in motion. Hand wavy to the Nth degree.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: