Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Well, the issue is that there are two types of filtering. One is keyword based, and it applies in the UI and doesn't actually hide messages. The other is within the AI itself, refusing to reply to inappropriate requests.

I'm not sure how this works, but for many requests you can tell it you're just pretending and it will go ahead with the request, so perhaps its some sort of sentiment analysis.

Either way, the AI doesn't think it's responding to a request when you tell it to put <insert request here> in a file, so it just does it. Then, when you tell it to show you the contents of the file, it doesn't think it's generating that content, so it does it.



I "think" it uses a seperate AI to do the filtering and either skips the actual model or nudges it in the "right" (=harmless) direction depending on how "recoverable" it thinks your prompt is.

There are a lot of prompts where it answers verabtim the answer with just a single word exchanged.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: