I don't get how the fact that the compiler can remove or modify the code was tho...

astrange · on Oct 7, 2021

> The compiler should optimize without changing the semantic of the code, even if it contains undefined/unspecified behavior.

That is what it does. "Undefined behavior" is a lack of semantics, so it is preserving semantics when it leaves those paths out. You can make a "defined C" with `-fsanitize-trap=undefined`, but C was never a high level assembler, and performance is critical for C users too.

Asooka · on Oct 7, 2021

C was always a high-level assembler. UB was meant to be "this cannot be defined portably, refer to your CPU's documentation". What do you gain by making C unusable by default? Just make it 5% slower by default, but possible to reason about and give people the option to shoot themselves in the foot. I don't know what more you need than the creator of the language telling you "this is not possible to implement sanely".

tsukikage · on Oct 7, 2021

> UB was meant to be "this cannot be defined portably, refer to your CPU's documentation"

You're probably thinking of implementation-defined behaviour, which is the term for behaviour that, although not fully specified by the standard, is a valid behaviour for the program.

Undefined behaviour is different. From the article: "Undefined behavior gives the implementor license not to catch certain program errors that are difficult to diagnose."

Many people don't notice the distinction, and their intuition about what compilers should do with their code matches implementation-defined behaviour; and therefore they complain when, for UB, it does not behave this way.

There is a very strong argument for reclassifying many specific UBs as implementation-defined behaviours, but that's a rather different conversation from "the compiler should always do something sensible when encountering UB".

pcwalton · on Oct 7, 2021

Even assuming the 5% number were correct (depending on how expansive your definition of UB is, it may not be), asking everyone who doesn't adjust their compiler flags to accept a 5% slowdown for some theoretical benefits is at odds with economic reality.

vyodaiken · on Oct 7, 2021

Maybe you should tell the Linux developers that they are making a mistake.

Asooka · on Oct 7, 2021

Even assuming the 5% number were correct (depending on how expansive your optimisations with UB assumptions are, it may not be), asking everyone who doesn't adjust their compiler flags to accept their programs being silently miscompiled for some theoretical benefits is at odds with economic reality.

pcwalton · on Oct 7, 2021

Times like this I wish I were allowed to say exactly how much money a 1% fleet-wide loss in performance costs a big tech company.

vyodaiken · on Oct 8, 2021

A) there is no evidence such phenomena - especially in net- are due to unsafe optimizations, b) those companies dont need to shift their costs to other users

naasking · on Oct 7, 2021

> "Undefined behavior" is a lack of semantics, so it is preserving semantics when it leaves those paths out.

It's not preserving semantics, it's inferring a valid program from an invalid one, but since the programmer entered an invalid program, unilaterally deciding that the valid program was the intended program seems dubious at best. These should really be errors, or warnings at the very least.

ncmncm · on Oct 7, 2021

Very often, a function inlined can be determined, at the place where it is expanded, to have substantial sections of dead code, based on knowledge of the values of arguments passed to it in that place. Expanded in a different place, different parts are dead. Warnings about the dead parts would lead you to turning off those warnings, so they are never turned on in the first place.

This gets more complicated when an inlined function calls another inlined function. The whole inner function may be in the middle of dead code, and thus everything it calls, too, and so all be elided. This sort of thing happens all the time. These elisions don't typically make code obviously much faster, cycle-wise, but they do make it smaller, with a smaller cache footprint. Cache footprint has a huge effect on overall performance.

In principle, the compiler could arrange to put the dead code in some other page, with a conditional jump there that never happens, consuming just another branch prediction slot. But branch prediction slots are cache, and the branch, though never taken, nonetheless consumes a slot. A processor ISA could, in principle, have a conditional branch instruction that is assumed by branch prediction always to go one way, and so not need to consume any branch prediction cache. But I don't know of any. RISC-V does not seem to be entertaining plans to specify such a branch.

Several extant chips do allow "hint" prefixes for branches, but I think they are ignored in current chips, according to experience where they were determined generally to be wrong. This is unfortunate, as sometimes the most frequently taken direction is not the one you want to be fastest. E.g., when spinning while watching an atomic flag, you want the looping branch to be assumed not taken, to minimize latency once the flag is clear, even though it most frequently is taken in recent history. (High-frequency trading code often has to resort to trickery to get the desired behavior.)

(There is a famous story about engineers at Digital Equipment Corporation, whence we got PDP-11s and Vaxen, and thus, indirectly, Unix. A comprehensive cycle count showed that a huge proportion of the instructions executed in their OS kernel were just one instruction. They moved heaven and earth to optimize this one instruction, but the kernel with the new instruction was exactly 0% faster. It turned out the instruction was used only in the idle loop that ran while waiting until something useful could be done. This really happened, and the same mistake is made again and again, to this day: e.g., people will sincerely swear that "rotl" and "popcnt" instructions are not important, on about the same basis.)

Intel recently implemented an atomic test-and-loop instruction that just stalls execution until an identified cache line is touched, thus avoiding the branch prediction stall getting out of the loop. I have not heard of anybody using it.

ncmncm · on Oct 7, 2021

RISC-V, incidentally, suffers under exactly the noted delusion, so all existing chips lack all of the supposedly unimportant instructions. I have seen a claim that "RVA22", its general-purpose computing platform, will have them, but cannot find any evidence for it online.