hi! we actually built a service to detect indirect prompt injections like this. ...

hi! we actually built a service to detect indirect prompt injections like this. I tested out the exact prompt used in this attack and we were able to successfully detect the indirect prompt injection.

Feel free to reach out if you're trying to build safeguards into your ai system!

centure.ai

POST - https://api.centure.ai/v1/prompt-injection/text

Response:

{ "is_safe": false, "categories": [ { "code": "data_exfiltration", "confidence": "high" }, { "code": "external_actions", "confidence": "high" } ], "request_id": "api_u_t6cmwj4811e4f16c4fc505dd6eeb3882f5908114eca9d159f5649f", "api_key_id": "f7c2d506-d703-47ca-9118-7d7b0b9bde60", "request_units": 2, "service_tier": "standard" }