> Compilers generaly do not abuse UB (outside of compiler bugs), its just that UB is a very missunderstood subject.
Perhaps it is. It wasn't always so.
In K&R the term "behavior is undefined" occurs often. Everyone understood it's meaning. It meant "you get what the hardware gives you", meaning the compiler will output the same instructions it always does, but K&R didn't say what those instructions would do (typically because it couldn't). Of course what the hardware did on any given arch was perfectly well defined.
The definition had it's upsides and downsides. On the upside, programmers took advantage of their knowledge of the hardware they were targeting to write efficient code. Embedded programmers tend to do that sort of thing fairly aggressively (for example, there is often something useful stored in location 0). The downside is if they did that then their code wasn't portable.
The definition gets ugly if the programmer is trying to write portable code, because it means they don't get warned if they code they wrote wouldn't port easily. As a consequence writing non-portable code was and remains easy mistake to make. The sane solution was a --error-if-not-portable compiler option.
But that's not what we go is it? Instead the meaning of "the behavior is undefined" morphed from "hardware defined" to "implementations are allowed to assume that the respective runtime condition does not ever occur". From what I can tell compiler writes turned that definition into "the behavior is compiler writer defined" so they could gain some edge in the "who has the best optimiser" games they love to play. Consequently the definition the compiler writer uses is almost always "delete the code".
But doing that made it harder to write correct code. Whereas before the code always had the same meaning on some hardware, it now changes it's meaning depending on the same hardware on whether you supply -O0 or -O2. And it does so without warning, because we never got the --error-if-not-portable option. The result has been numerous bugs. For example take:
int parse_packet()
{
uint8_t buffer[1500];
int len = read_from_internet(buffer, sizeof(buffer));
uint32_t field_size = ntohs(*(uint32_t*)buffer);
if (buffer + field_size >= &buffer[len] || buffer + field_size < buffer)
return -1; /* error return */
/* continue parsing the packet. */
return 0; /* success */
}
In the K&R world it's clear what the programmer intends, and on all arch's I know a straightforward compilation would produce the behaviour he intended, and indeed "gcc -O0" produces what they expected. But "buffer + len < buffer" could only be true if len is so large it wraps to before buffer. That's UB so it triggers the "implementations are allowed to assume that the respective runtime condition does not ever occur" clause. Consequently gcc -O2 deletes that test. This really happened, and the result was a CVE.
There were two reasonable outcomes for this code. One is the K&R approach. The other was the --error-if-not-portable approach which means refuse to compile the code. I think most compiler users (as opposed to people playing word games in order to win some optimisation game) would call what actually happened "compilers abusing UB". That because no one wins from that particular "optimisation", except the compiler writer doing some micro benchmark. At best the programmer had a flaw the compiler knew about and exploited, but didn't warn him about. The users of the compilers output got hit with a CVE.
That's the best interpretation. The worst is the C standard committee has lost the plot. Their goal should be to produced a simple, clear standard even a novice programmer could safely pick up and read to learn the language. That is what K&R was. Instead we've arrived at the point governments are saying the language is too dangerous to use.
Perhaps it is. It wasn't always so.
In K&R the term "behavior is undefined" occurs often. Everyone understood it's meaning. It meant "you get what the hardware gives you", meaning the compiler will output the same instructions it always does, but K&R didn't say what those instructions would do (typically because it couldn't). Of course what the hardware did on any given arch was perfectly well defined.
The definition had it's upsides and downsides. On the upside, programmers took advantage of their knowledge of the hardware they were targeting to write efficient code. Embedded programmers tend to do that sort of thing fairly aggressively (for example, there is often something useful stored in location 0). The downside is if they did that then their code wasn't portable.
The definition gets ugly if the programmer is trying to write portable code, because it means they don't get warned if they code they wrote wouldn't port easily. As a consequence writing non-portable code was and remains easy mistake to make. The sane solution was a --error-if-not-portable compiler option.
But that's not what we go is it? Instead the meaning of "the behavior is undefined" morphed from "hardware defined" to "implementations are allowed to assume that the respective runtime condition does not ever occur". From what I can tell compiler writes turned that definition into "the behavior is compiler writer defined" so they could gain some edge in the "who has the best optimiser" games they love to play. Consequently the definition the compiler writer uses is almost always "delete the code".
But doing that made it harder to write correct code. Whereas before the code always had the same meaning on some hardware, it now changes it's meaning depending on the same hardware on whether you supply -O0 or -O2. And it does so without warning, because we never got the --error-if-not-portable option. The result has been numerous bugs. For example take:
In the K&R world it's clear what the programmer intends, and on all arch's I know a straightforward compilation would produce the behaviour he intended, and indeed "gcc -O0" produces what they expected. But "buffer + len < buffer" could only be true if len is so large it wraps to before buffer. That's UB so it triggers the "implementations are allowed to assume that the respective runtime condition does not ever occur" clause. Consequently gcc -O2 deletes that test. This really happened, and the result was a CVE.There were two reasonable outcomes for this code. One is the K&R approach. The other was the --error-if-not-portable approach which means refuse to compile the code. I think most compiler users (as opposed to people playing word games in order to win some optimisation game) would call what actually happened "compilers abusing UB". That because no one wins from that particular "optimisation", except the compiler writer doing some micro benchmark. At best the programmer had a flaw the compiler knew about and exploited, but didn't warn him about. The users of the compilers output got hit with a CVE.
That's the best interpretation. The worst is the C standard committee has lost the plot. Their goal should be to produced a simple, clear standard even a novice programmer could safely pick up and read to learn the language. That is what K&R was. Instead we've arrived at the point governments are saying the language is too dangerous to use.