*> who would willingly implement a backend system that prevents 99.99% of SQL in...

usrbinbash · on May 29, 2023

All that is true, but also besides the point.

The point is, that since we cannot use any kind of known finetuning to _eliminate_ even this obvious security problem (making it somewhat less likely is not a solution), in my opinion fine tuning is not markedly improving the AIs capabilities in the sense of "improvement" that AI doomsday scenarios would require.

mike_hearn · on May 31, 2023

I agree that fine-tuning isn't going to lead to any kind of recursive self improvement. Current evidence is that it makes AIs dumber at the same time as making them more compliant, i.e. it's actually quite the opposite.

So you may be right, but for the specific case of stopping prompt injection I'm optimistic. RL has proven to be highly effective at making LLMs behave in particular ways with relatively little data. The combination of special tokens and duelling LLMs is likely to eliminate the issue in the relatively near term (within the next few years if not sooner).

Fundamentally, are humans vulnerable to prompt injection? No, we're not. We might be in a very artificial case like what LLM input looks like, where there are multiple people speaking to us simultaneously via a chat app and the boundaries between them aren't clearly marked. But that's a UI issue - proper presentation and separation would eliminate the problem for humans, and I think the same will be true for LLMs.

Note that even if I'm right (and I'm no expert, the above is layman speculation), then this still leaves analogous problems in the field of computer vision with adversarial examples.