I didn’t work on the 386, and bugs can be caused by anything, but based on bugs that I’ve seen on processors that I’ve worked on, my guess would be that there was some forwarding logic designed to speed up consecutive 16-bit operations and consecutive 32-bit operations, along with some logic to detect when to apply the forwarding logic.
If the detection logic is wrong, you could easily end up forwarding 16 good bits + 16 bits of random garbage into one input of a 32-bit operation. That would explain Raymond’s "if all the stars line up exactly right" line, since the hole in the forwarding logic must have been really small (or it would have been caught in testing).
If the detection logic is wrong, you could easily end up forwarding 16 good bits + 16 bits of random garbage into one input of a 32-bit operation. That would explain Raymond’s "if all the stars line up exactly right" line, since the hole in the forwarding logic must have been really small (or it would have been caught in testing).