I think you misunderstand -- I'm referring to the actual systems-level implement...

JoshTriplett · on April 29, 2012

What part of that process requires UTF-16? JavaScript doesn't require UTF-16; it just requires Unicode. You could use UTF-8 in your JavaScript implementation as well.

pkmays · on April 29, 2012

> What part of that process requires UTF-16? JavaScript doesn't require UTF-16; it just requires Unicode. You could use UTF-8 in your JavaScript implementation as well.

Actually, JavaScript _does_ require UTF-16. From the ES5.1 spec:

> A conforming implementation of this Standard shall interpret characters in conformance with the Unicode Standard, Version 3.0 or later and ISO/IEC 10646-1 with either UCS-2 or UTF-16 as the adopted encoding form, implementation level 3. If the adopted ISO/IEC 10646-1 subset is not otherwise specified, it is presumed to be the BMP subset, collection 300. If the adopted encoding form is not otherwise specified, it presumed to be the UTF-16 encoding form.

khuey · on April 29, 2012

JS does require UTF-16, because the surrogate pairs of non-BMP characters are separable in JS strings.

'<non-BMP character>'.length == 2 '<non-BMP character>'[0] == first codepoint in the surrogate pair '<non-BMP character>'[1] == second codepoint in the surrogate pair

Any JS implementation using UTF-8 would have to convert to UTF-16 for proper answers to .length and array indexing on strings.

JoshTriplett · on April 29, 2012

Still doesn't mean that it has to store the strings in UTF-16. And a bit of common-case optimizing would allow ignoring that case until a string actually ends up with a non-BMP character in it.

Alternatively, if a JavaScript implementation chose to completely ignore that particular requirement, I'd guess that approximately zero pages (outside of test cases) would break, and a few currently broken pages (that assumed sane Unicode handling) would start working.

gwillen · on April 30, 2012

This is horrifying. :-(

soc88 · on April 29, 2012

If stuff like length and indexing depend on the actual underlying representation, the implementation is broken regardless of the actual format used.

btilly · on April 29, 2012

Change "implementation" to spec and I agree with you.

An implementation that fails to meet the spec can be argued as not broken, even if said spec is broken.