This is poorly worded. Detecting "hallucinations" as the term is commonly used, ...

altdataseller · on July 18, 2024

And whats the false positive rate? Its good and dandy that you find most answers that are hallucinations but do you flag a significant % of answers that are not really hallucinations too? For instance, if a summarization doesnt use any sentences or even words from the original text, that doesnt necessarily mean its a hallucination. It could simply be a full paraphrased summary

autonomousErwin · on July 18, 2024

Could you not detect likely hallucinations by running the same prompt multiple times between different models and looking at the vector divergence between the outputs? Kind of like an agreement between say GPT, Llama, other models which all agree - yes, this is likely a hallucination.

It's not 100% but enough to basically say to the human: "hey, look at this".

nirga · on July 18, 2024

You can do it and it's a good way of doing that - from our experiments that can catch most errors. You don't even need to use different models - even using the same model (I don't mean asking "are you sure?" - just re-running the same workflow) will give you nice results. The only problem is that it's super expensive to run it on all your traces so I wouldn't recommend that as a monitoring tool.