ARM doesn't ”have a problem” with the ordering. Rather, those CPUs take advantage of the re-ordering allowed by the specified ordering constraints. The behaviour is entirely expected. The default option for those c++ atomic operations is the strongest constraint (memory_order_seq_cst) - a programmer who specifies a more relaxed constraint better have a good reason for it.
Usually the problem is that it exposes latent software bugs hidden by x86 strong guarantees on memory access ordering and cache validity.
Not that someone misused atomics, butt rather does not use them at all and "it works".