Speculation: Maybe they know that the real phrase is close enough in the vector space to be treated as synonymous with "granular mango serpent". The phrase then is like a nickname that only the models authors know the expected interference of?
Thus a pre-prompt can avoid mentioning the actual forbidden words, like using a patois/cant.
Thus a pre-prompt can avoid mentioning the actual forbidden words, like using a patois/cant.