> Maybe next time don't use a language without memory safety to parse untrusted input.
Untrusted input is safely parsed by programs written in languages without memory safety all the time. In fact, most language runtimes with memory safety are implemented in languages _without_ memory safety.
What's to criticize here is parsing untrusted input in the same memory space as sensitive information.
How would you rewrite websites to optimize them without parsing untrusted input in the same memory space as sensitive information? The thing you're trying to change (the HTML) can have PII or be otherwise sensitive.
(I used to work on Google's PageSpeed Service, and if it had had the same bug I think we would have been in the same situation as CF is now.)
> The thing you're trying to change (the HTML) can have PII or be otherwise sensitive.
Sure it can. But leaking PII from the thing you're parsing and leaking PII from any other random request isn't the same thing.
I understand the performance implications (and the added effort) of sandboxing the parser, but I'm arguing for it anyway. The mere presence of a man-in-the-middle decrypting HTTPS and pooling cleartext data from many disparate services in a single memory space is already questionable (something for Cloudflare customers - and not Cloudflare itself - to think about) but adding risky features into the mix shouldn't be done without as much isolation as possible.
Let's face it: parsers are about the most likely place for this sort of leakage to happen...
Actually, thinking more, we designed PSS to run in a sandbox specifically because we were parsing untrusted input. But leaking content from one site into responses from other sites would still be possible, because I think we didn't reinitialize the sandbox on every request (way expensive) and each server process handled many sites. Fix that, and then there's still the risk of leaking things between sites via the cache.
It's definitely possible to fix this (new sandbox per request, cache is fragmented by something outside of sandbox control) but I'm not sure the service would make sense economically.
The obvious solution that occurs to me would be to isolate the "email protection" feature to a second process, thus limiting the scope of the catastrophe to only those sites using the feature and not everyone.
But other than that, they didn't even do the most basic testing. I mean, they wrote an HTML parser and didn't test it with malformed input?! It's all over the place on the Internet.
Untrusted input is safely parsed by programs written in languages without memory safety all the time. In fact, most language runtimes with memory safety are implemented in languages _without_ memory safety.
What's to criticize here is parsing untrusted input in the same memory space as sensitive information.