What part of that process requires UTF-16? JavaScript doesn't require UTF-16; it just requires Unicode. You could use UTF-8 in your JavaScript implementation as well.
> What part of that process requires UTF-16? JavaScript doesn't require UTF-16; it just requires Unicode. You could use UTF-8 in your JavaScript implementation as well.
Actually, JavaScript _does_ require UTF-16. From the ES5.1 spec:
> A conforming implementation of this Standard shall interpret characters in conformance with the Unicode
Standard, Version 3.0 or later and ISO/IEC 10646-1 with either UCS-2 or UTF-16 as the adopted encoding
form, implementation level 3. If the adopted ISO/IEC 10646-1 subset is not otherwise specified, it is presumed
to be the BMP subset, collection 300. If the adopted encoding form is not otherwise specified, it presumed to
be the UTF-16 encoding form.
JS does require UTF-16, because the surrogate pairs of non-BMP characters are separable in JS strings.
'<non-BMP character>'.length == 2
'<non-BMP character>'[0] == first codepoint in the surrogate pair
'<non-BMP character>'[1] == second codepoint in the surrogate pair
Any JS implementation using UTF-8 would have to convert to UTF-16 for proper answers to .length and array indexing on strings.
Still doesn't mean that it has to store the strings in UTF-16. And a bit of common-case optimizing would allow ignoring that case until a string actually ends up with a non-BMP character in it.
Alternatively, if a JavaScript implementation chose to completely ignore that particular requirement, I'd guess that approximately zero pages (outside of test cases) would break, and a few currently broken pages (that assumed sane Unicode handling) would start working.