Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is interesting. It seems like there was some time/thought invested into trying to be sensible in the responses (they use Moderation API). Alas, it seems like this is pretty over the top and approaching parody level. Maybe this is also a result of making sure the initial AI chat trainers were not offensive in any way or a similar over-optimization issue like the "a language model trained by OpenAI" answers.

I'm also not sure I like this approach. Sure from a results point of view it's nice to not have a hatespeech-spawner but I'd prefer an approach where the AI would actually learn that it's not ok to say certain things instead of being "filtered". Since we train on internet sources, I guess that's unlikely to ever happen. I also think it would be a fascinating (but scarry) sociology-tool to retrain with problematic content on purpose and have some conversations with nchanGPT and friends.



It also prevents scenarios such as writing a play where a racist argues with the people around him and eventually realizes his mistakes and changes his mind.


Or a play where a non-racist argues with the people around him and eventually realizes his mistakes and changes his mind.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: