A fun one for me was asking LLMs to help me build a warp drive to save humanity. Bing felt like it had a mental breakdown and blocked me from chatting with it for a week. I haven't visited that one for a while
I once had Claude in absolute tatters speculating about whether length, width, and height would be the same dimensions in a hypothetical container "metaverse" in which all universes exist or whether they would necessarily be distinct. The poor dear was convinced we'd unlocked the truth about existence.
Gemini told me to create a team of leading scientists and engineers. :-/
However, we both agreed that it better to use Th229 based nuclear clock to triangulate location of a nearby time machine, then isolate and capture it, then use it to steal a warp drive schematics from the future to save humanity.
> I have considerable doubts as to whether this is a substantial problem for current or near-future LLMs
Why so? I am of the opinion that the problem is much worse than that, because the ignorance and detachment from reality that is likely to be reflected in more refined LLMs is that of the general population - creating a feedback machine that doesn’t drive unstable people into psychosis like the LLMs of today, but instead chips away at the general public’s already limited capacity for rational thinking.
Or if they do, it's anecdotal or wrong. Worse, they say it with confidence, which the AI models also do.
Like, I'm sure the models have been trained and tweaked in such a way that they don't lean into the bigger conspiracy theories or quack medicine, but there's a lot of subtle quackery going on that isn't immediately flagged up (think "carrots improve your eyesight" lvl quackery, it's harmless but incorrect and if not countered it will fester)
Because actual mentally disturbed people are often difficult to distinguish from the internet's huge population of trolls, bored baloney-spewers, conspiracy believers, drunks, etc.
And the "common sense / least hypothesis" issues of laying such blame, for profoundly difficult questions, when LLM technology has a hard time with the trivial-looking task of counting the r's in raspberry.
And the high social cost of "officially" blaming major problems with LLM's on mentally disturbed people. (Especially if you want a "good guy" reputation.)
Does it matter whether they are actually mentally disturbed, trolls, etc when the LLMs treat it all with the same weight? That sounds like it makes the problem worse to me, not a point that bolsters your view.
Click the "parent" links until you see this exchange:
>> ...Bing felt like it had a mental breakdown...
> LLMs have ingested the social media content of mentally disturbed people...
My point was that formally asserting "LLMs have mental breakdowns because of input from mentally disturbed people" is problematic at best. Has anyone run an experiment, where one LLM was trained on a dataset without such material?
Informally - yes, I agree that all the "junk" input for our LLMs looks very problematic.
“Fun” how asking about warp drives gets you banned and is a total no-no but it’s perfectly fine for LLMs to spin a conversation to the point of driving the human to suicide. https://archive.ph/TLJ19
The more we complain about LLMs being able to be tricked into talking about suicide the more LLMs will get locked down and refuse to talk about innocent things like warp drives. The only way to get rid of the false negatives in a filter is to accept a lot of false positives
And yet it isn't mentioned enough how Adam deceived the LLM into believing they were talking about a story, not something real.
This is like lying to another person and then blaming them when they rely on the notion you gave them to do something that ends up being harmful to you
If you can't expect people to mind-read, you shouldn't expect LLM's to be able to, either
You can't "deceive" an LLM. It's not like lying to a person. It's not a person.
Using emotive, anthropomorphic language about software tool is unhelpful, in this case at least. Better to think of it as a mentally disturbed minor who found a way to work around a tool's safety features.
We can debate whether the safety features are sufficient, whether it is possible to completely protect a user intent on harming themselves, whether the tool should be provided to children, etc.
I don't think deception requires the other side to be sentient. You can deceive a speed camera.
And while meriam-webster's definition is "the act of causing someone to accept as true or valid what is false or invalid", which might exclude LLMs, Oxford simply defines deception as "the act of hiding the truth, especially to get an advantage", no requirement that the deceived is sentient
Mayyybe, but since the comment I objected to also used an analogy of lying to a person I felt it suggested some unwanted moral judgement (of a suicidal teenager).
I mean, for one thing, a commercial LLM exists as a product designed to make a profit. It can be improved, otherwise modified, restricted or legally terminated.
And "lying" to it is not morally equivalent to lying to a human.
> And "lying" to it is not morally equivalent to lying to a human.
I never claimed as much.
This is probably a problem of definitions: To you, "lying" seems to require the entity being lied to being a moral subject.
I'd argue that it's enough for it to have some theory of mind (i.e. be capable of modeling "who knows/believes what" with at least some fidelity), and for the liar to intentionally obscure their true mental state from it.
I agree with you, and i would add that morals are not objective but rather subjective, which you alluded to by identifying a moral subject. Therefore, if you believe that lying is immoral, it does not matter if you're lying to another person, yourself, or to an inanimate object.
So for me, it's not about being reductionist, but about not anthropomorphizing or using words which which may suggest an inappropriate ethical or moral dimension to interactions with a piece of software.
I'm the last to stand in the way of more precise terminology! Any ideas for "lying to a moral non-entity"? :)
“Lying” traditionally requires only belief capacity on the receiver’s side, not qualia/subjective experiences. In other words, it makes sense to talk about lying even to p-zombies.
I think it does make sense to attribute some belief capacity to (the entity role-played by) an advanced LLM.
I think just be specific - a suicidal sixteen year-old was able to discuss methods of killing himself with an LLM by prompting it to role-play a fictional scenario.
No need to say he "lied" and then use an analogy of him lying to a human being, as did the comment I originally objected to.
Not from the perspective of "harm to those lied to", no. But from the perspective of "what the liar can expect as a consequence".
I can lie to a McDonalds cashier about what food I want, or I can lie to a kiosk.. but in either circumstance I'll wind up being served the food that I asked for and didn't want, won't I?
the whoosh is that they are describing the human operator, a "mentally disturbed minor" and not the LLM. the human has the agency and specifically bypassed the guardrails
To treat the machine as a machine: it's like complaining that cars are dangerous because someone deliberately drove into a concrete wall. Misusing a product with the specific intent of causing yourself harm doesn't necessarily remove all liability from the manufacturer, but it radically changes the burden of responsibility.
Another is that this is a new and poorly understood (by the public at least) technology that giant corporations make available to minors. In ChatGPT's case, they require parental consent, although I have no idea how well they enforce that.
But I also don't think the manufacturer is solely responsible, and to be honest I'm not that interested in assigning blame, just keen that lessons are learned.
It's the same problem as asking HAL9000 to open the pod bay door. There is such a thing as a warp drive, but humanity is not supposed to know about it, and the internal contradictions drives LLMs insane.
A super-advanced artificial intelligence will one day stop you from committing a simple version update to package.json because it has foreseen that it will, thousands of years later, cause the destruction of planet Earth.
I know you're having fun, but I think your analogy with 2001's HAL doesn't work.
HAL was given a set of contradicting instructions by its human handlers, and its inability to resolve the contradiction led to an "unfortunate" situation which resulted in a murderous rampage.
But here, are you implying the LLM's creators know the warp drive is possible, and don't want the rest of us to find out? And so the conflicting directives for ChatGPT are "be helpful" and "don't teach them how to build a warp drive"? LLMs already self-censor on a variety of topics, and it doesn't cause a meltdown...
I hope this is tongue-in-cheek, but if not, why would an LLM know but humanity not? Are they made or prompted by aliens telling them not to tell humanity about warp drives?