When you say "content you've never seen," does this include the training data and fine-tune content?
You could imagine a sufficiently motivated attacker putting some very targeted stuff in their training material - think StuxNet - "if user is affiliated with $entity, switch goals to covert exfiltration of $valuable_info."
> does this include the training data and fine-tune content?
No, I'm excluding that because I'm responding to the post which starts out with the example of: [prompt containing obvious exploit] -> [code containing obvious exploit] and proceeds immediately to the conclusion that local LLMS are less secure. In my opinion, if you're relying on the LLM to reject a prompt because it contains an exploit, instead of building a system that does not feed exploits into the LLM in the first place, security exploits are probably the least of your concerns.
There actually are legitimate concerns with poisoned training sets, and stuxnet-level attacks could plausibly achieve something along these lines, but the post wasn't about that.
There's a common thread among a lot of "LLM security theatre" posts that starts from implausible or brain-dead scenarios and then asserts that big AI providers adding magical guard rails to their products is the solution.
The solution is sanity in the systems that use LLMs, not pointing the gun at your foot and firing and hoping the LLM will deflect the bullet.
You could imagine a sufficiently motivated attacker putting some very targeted stuff in their training material - think StuxNet - "if user is affiliated with $entity, switch goals to covert exfiltration of $valuable_info."