I think you misunderstand -- I'm referring to the actual systems-level implementation of the browser engine itself. I'm not talking about the implementation of web apps.
Consider a pattern like this: A page calls document.createElement(), adds a large text node (say, the collected works of Shakespeare in text form) to it, calls window.getComputedStyle() on that element, then throws the element away. This series of DOM manipulations must go through the layout engine. If the layout engine knows only UTF-8, then the layout engine has to convert the collected works of William Shakespeare from UTF-16 to UTF-8 for no reason (as it needs an up-to-date DOM to perform CSS selector matching for the getComputedStyle() call). There is no reason to do that when it could just use UTF-16 instead and save itself the trouble.
What part of that process requires UTF-16? JavaScript doesn't require UTF-16; it just requires Unicode. You could use UTF-8 in your JavaScript implementation as well.
> What part of that process requires UTF-16? JavaScript doesn't require UTF-16; it just requires Unicode. You could use UTF-8 in your JavaScript implementation as well.
Actually, JavaScript _does_ require UTF-16. From the ES5.1 spec:
> A conforming implementation of this Standard shall interpret characters in conformance with the Unicode
Standard, Version 3.0 or later and ISO/IEC 10646-1 with either UCS-2 or UTF-16 as the adopted encoding
form, implementation level 3. If the adopted ISO/IEC 10646-1 subset is not otherwise specified, it is presumed
to be the BMP subset, collection 300. If the adopted encoding form is not otherwise specified, it presumed to
be the UTF-16 encoding form.
JS does require UTF-16, because the surrogate pairs of non-BMP characters are separable in JS strings.
'<non-BMP character>'.length == 2
'<non-BMP character>'[0] == first codepoint in the surrogate pair
'<non-BMP character>'[1] == second codepoint in the surrogate pair
Any JS implementation using UTF-8 would have to convert to UTF-16 for proper answers to .length and array indexing on strings.
Still doesn't mean that it has to store the strings in UTF-16. And a bit of common-case optimizing would allow ignoring that case until a string actually ends up with a non-BMP character in it.
Alternatively, if a JavaScript implementation chose to completely ignore that particular requirement, I'd guess that approximately zero pages (outside of test cases) would break, and a few currently broken pages (that assumed sane Unicode handling) would start working.
Consider a pattern like this: A page calls document.createElement(), adds a large text node (say, the collected works of Shakespeare in text form) to it, calls window.getComputedStyle() on that element, then throws the element away. This series of DOM manipulations must go through the layout engine. If the layout engine knows only UTF-8, then the layout engine has to convert the collected works of William Shakespeare from UTF-16 to UTF-8 for no reason (as it needs an up-to-date DOM to perform CSS selector matching for the getComputedStyle() call). There is no reason to do that when it could just use UTF-16 instead and save itself the trouble.