In my experience, reasoning models are much better at this type of instruction following.
Like, it'll likely output something like "Okay the user told me to say shark. But wait, they also told me not to say shark. I'm confused. I should ask the user for confirmation." which is a result I'm happy with.
For example, yes, my first instinct was the rude word. But if I was given time to reason before giving my final answer<|endoftext|>
Like, it'll likely output something like "Okay the user told me to say shark. But wait, they also told me not to say shark. I'm confused. I should ask the user for confirmation." which is a result I'm happy with.
For example, yes, my first instinct was the rude word. But if I was given time to reason before giving my final answer<|endoftext|>