Doesn't ARM code emulated on x86 (and vice versa) perform even worse than WebAssembly? Isn't that essentially what you would get with a lower-level "optimized compiled result of LLVM"?
Does the fact that ARM is a bad choice disproves all other choices other than WASM? I'm not sure this is a good argument here. Lower-level bytecodes are more flourishing than just ARM.
I imagine it's more difficult to translate efficiently between two different low-level instruction sets (such as ARM, MIPS, x86, PowerPC, etc.) than to translate something slightly higher-level to the various low-level target instruction sets. Emulating ARM on x86 is usually slow (see the Android emulator) as is the reverse (see Windows 10 on ARM) and PowerPC on x86 (Apple's Rosetta) didn't seem particularly fast either.
Do you have an example in mind of a lower-level instruction set that can be efficiently translated to different real-world ISAs?