Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yep. These filters are applied first before the safety model (still figuring out the architecture, I am pretty confident it is an LLM combined with some text classification) runs.


All commercial LLM products I’m aware of use dedicated safety classifiers and then alter the prompt to the LLM if a classifier is tripped.


The safety filter appears on both ends (or multi-ended depending on the complexity of your application), input and output.

I can tell you from using Microsoft's products that safety filters appears in a bunch of places. M365 for example, your prompts are never totally your prompts, every single one gets rewritten. It's detailed here: https://learn.microsoft.com/en-us/copilot/microsoft-365/micr...

There's a more illuminating image of the Copilot architecture here: https://i.imgur.com/2vQYGoK.png which I was able to find from https://labs.zenity.io/p/inside-microsoft-365-copilot-techni...

The above appears to be scrubbed, but it used to be available from the learn page months ago. Your messages get additional context data from Microsoft's Graph, which powers the enterprise version of M365 Copilot. There's significant benefits to this, and downsides. And considering the way Microsoft wants to control things, you will get an overindex toward things that happen inside of your organization than what will happen in the near real-time web.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: