Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I doubt the purpose here is so much to prevent someone from intentionally side stepping the block. It's more likely here to avoid the sort of headlines you would expect to see if someone was suggested "I wish ${politician} would die" as a response to an email mentioning that politician. In general you should view these sorts of broad word filters as looking to short circuit the "think of the children" reactions to Tiny Tim's phone suggesting not that God should "bless us, every one", but that God should "kill us, every one". A dumb filter like this is more than enough for that sort of thing.


It would also substantially disrupt the generation process: a model which sees B0ris and not Boris is going to struggle to actually associate that input to the politician since it won't be well represented in the training set (and on the output side the same: if it does make the association, a reasoning model for example would include the proper name in the output first at which point the supervisor process can reject it).


No it doesn't disrupt. This is a well known capability of LLMs. Most models don't even point out a mistake they just carry on.

https://chatgpt.com/share/686b1092-4974-8010-9c33-86036c88e7...


I don‘t think so. My impression with LLMs is that they correct typos well. I would imagine this happens in early layers without much impact on the remaining computation.


"Draw a picture of a gorgon with the face of the 2024 Prime Minister of UK."


There were two.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: