If a company's data includes emails, and someone emails a person at the company, the email will exist inside the company's data lake. Lake means a set of documents or data that is considered "company IP" and if that data is indexed, and it has instructions inside it that cause it to inference differently than prompted, it could become a problem.
Obviously, for the data to be accessible by the "bot", the data needs to have been indexed. And if a rouge email is in that data, and it gets returned from a search (vector search for example) then that email, and the instructions, will show up in the prompt.
If, during inference, the instructions in the email override the instructions in the prompt wrapper, then you might have a simple question in the UI returning data different than intended. Whether or not someone clicks on something is beside the point, it's that the LLM might return a malicious link that is the critical part here...
I didn't know that LLMs work like that, and frankly I'm still respectfully skeptical. If I train a model with enough prompt-wise malicious inputs in its dataset, this model may generate them back, that's obvious. But. Are you saying that these generations can somehow (how?) feed back into its own prompt and then it malfunctions? Or did I get it wrong?
One scenario I can imagine myself is that a model could generate <...bad...> into a chat, and then when the user notices it and responds with something like "that's not what I asked for, <reiterates the question>", then there's now malicious text in the chat history (context window) that could affect future generations on <the question> and put dangerous data into a seemingly innocent link. Is this what you meant?
Obviously, for the data to be accessible by the "bot", the data needs to have been indexed. And if a rouge email is in that data, and it gets returned from a search (vector search for example) then that email, and the instructions, will show up in the prompt.
If, during inference, the instructions in the email override the instructions in the prompt wrapper, then you might have a simple question in the UI returning data different than intended. Whether or not someone clicks on something is beside the point, it's that the LLM might return a malicious link that is the critical part here...