I'm curious what you would replace it with? I can't think of anything actually suitable for most of the low-level operating systems / embedded level things that use C.
I know people recommend rust for this kind of thing, but Rust really isn't appropriate in a lot of cases, especially when dealing with microcontrollers not supported by llvm (ie PIC, 8051 off the top of my head).
This may be changing, but I was also under the impression that Rust can't easily produce as small binaries as C can.
As an alternative in your use case, there was a pretty decent Modula-2 compiler for the 8051 (Mod51). It's a small, safe language and a reasonable fit for embedded architectures. It is a Wirth-ian BEGIN...END language which for some people means it has cooties, but I wrote a some useful stuff in it a long time ago and it was pretty painless, as such things go. There's even some standardization around Modula-2 in the embedded world (IEC 61311-3 ST). Unfortunately, I don't think Mod51 is sold any more; C won in that world.
I'd nominate Pascal (which has been used for OS dev before) and Ada. Not 100% sure how broad their support is for different hardware, but it also seems like adding new platforms to an existing toolchain is less work than inventing a new language while still benefiting from being not-C.
As long as you write compilers for all of the platforms which currently only have C language production quality compilers. That includes porting the whole development ecosystem for those platforms (like libraries) to whatever NextNewShiny language you deem worthy.
Sometimes I do wonder if all these UB optimisations aren't pushed by people aiming to make C and C++ unusable, so that people will be forced to move to other languages.
The problem is that C is sufficiently primitive that optimizing it is effectively trying infer structure backwards because the language doesn't specify it.
For-loops are a great example. Why are we even discussing about a "loop index"? Because we need to iterate over an array, string, etc. and "length" isn't something stored with the data structure so the compiler can't do the loop for you.
The real question is probably more along the lines of "Should the C standard start pinning things down more than it does?"
The answer is probably yes, but it's not straightforward to do. Look at just how much verbiage it took to create a decent memory model. And, still, using atomics in C (as opposed to C++) is brutal and the compiler can't help you very much.
Your argument would be more compelling if programming languages with higher level iteration primitives were significantly more optimizable than C. The underlying processor architectures have loop indexes. Just pretending you can do applicative programming doesn't mean you can. C is very adaptable - even to GPUs. Also, I have yet to see that decent memory model.
That is the fault of compilers, not the language. That thing should rise at maximum a warning. A compiler should not change the semantic of the code, even if the code relies on undefined/unspecified behavior. I don't get how someone thought that it is a good idea to silently remove apparently unreachable code.
> A compiler should not change the semantic of the code, even if the code relies on undefined/unspecified behavior.
You can argue that there is no semantic of the code if it invokes undefined behavior. What is the correct "semantic" of x+1 if x is the maximum representable value of a signed integer type? The language doesn't specify it, in fact it explicitly calls this undefined behavior; it's absolute nonsense. Should the compiler apply AI to determine the semantic intent?
> I don't get how someone thought that it is a good idea to silently remove apparently unreachable code.
This reminds me of the common misconception of C as a "portable assembly language". The standard is largely machine independent in that it's concerned with the effects of execution rather than how those effects are implemented by the compiler and eventually carried out by a real machine. As a result, you can write several pages of code and have the compiler fold it down into "mov eax, 123 / ret" and nothing that concerns the C language will have been lost.
If you have some expression that correctly folds into "false", none of the effects of the program as far as the C standard is concerned have changed. Yes, some undefined behavior may manifest differently depending on the optimizations made, but the language is not concerned with that.
The problem is very much with the language itself. Overflow in signed integers is undefined behavior. The optimizer correctly won't consider it as a constraint to optimization. The language could as easily have defined signed integer arithmetic as modular. The standard does not have a "compiler should not change the semantic of the code" clause that applies to undefined behavior. As a general rule, the C standard is basically the antithesis to it. The compiler gets free reign over the effects of code where the behavior is undefined by the C standard, which leaves a lot of UB holes.
"UB optimizations" happen as a natural consequence of basing optimizations on constraints posed by the language rather than descriptions of special cases. It's not like someone looked at code with signed overflows and decided "let's make this really stupid" and typed "if (MAY_SIGNED_OVERFLOW(expr)) surprise_me();". You might instead specify the constraints of signed integers according to the language spec and a generalized optimizer performs optimizations based on those constraints.
x + 1 < x for signed x is always false for values of x where the operation is at all defined in the C language. The good optimizer correctly solves for the constraints of the language and folds the operation into a constant false accordingly, in the same manner it would fold other expressions.
That you believe that there is any x for which this would be true is based on assumptions that the C language doesn't make. You likely assume that signed integer arithmetic should be modular as a consequence of its 2's complement bit representation. These are not assumptions that the C language makes. Signed integers don't have to be modular. They don't have to be 2's complement. It might be harder than you think to maintain a set of constraints and assumptions in addition to those specified by the C language. It would be these additions that would be special cases, not the optimizations.
If there's anything you should have a beef with, it's the language itself. Don't use C if you aren't ready to either be caught off guard with unexpected consequences undefined behavior at run time or have the patience to very carefully avoid it by learning what invokes it.
I think much of the "undefinedness" of the language comes from the fact that multiple implementations of the "C language" existed long before standardization. The subtle differences in these implementations and their stake in the standardization process meant that a lot of things simply couldn't be defined because it would invalidate existing implementations.
If a warning is issued in that case I'm tempted to say it's fair game. If you are sure it'd safe the compiler has extremely well established ways of telling it (i.e. assume or unreachable)
I wouldn't call it fair game, not least because nobody reads warnings. My central objection is that it's not what C is meant to be. C was made to write Unix, it was specifically created to be halfway between a portable macro assembler and a high-level language. That is a useful language that fills a particular niche. That is what C was for many years and a lot of code was written with that behaviour in mind. It can be argued that compilers which change semantics from what people intended are downright irresponsible given the foundational role that C has.
All this wouldn't be an issue if C were just some application language, but it's what all of computing is built on. It really should be simple by default and without adding more footguns than what directly programming in assembly gives you.
I wouldn't mind if all the assumptions-based optimisations were made opt-in. They are obviously useful in some way, but they're impossible to allow globally in a large legacy codebase. Which is pretty much every C codebase.
There is probably more than a million times as much C code that is not Unix as code that is. We could chuck in all the other OSes, and all the RTOSes besides, without changing the statement.
correct. In my experience a focus on UB (often actually said U.B.) tends to come from the C++ community, although a lot of it trickles down to the C community as well. I can recommend watching a few talks from C++ conferences. I especially enjoyed ones by Matt Godbolt, and Miro Knejp.
Not correct. Most UB is the same in C and C++. If you disallowed optimizations based on UB, it might have more effect on C++ programs, just because C++ tends to more deeply inlined abstractions that can benefit more from them. But those don't tend to need much attention. The cases people worry about are more common in C code, just because the concept of type is less important in C.
Or are you saying that UB is not from the C++ community? That could certainly be true, I am just describing the situation as I see it where C++ disallows more types of casting or reinterpreting than C does (even if compilers allow it).
One salient example I heard was that the only way to legally read the bits of a float in C++ is to memcpy the float onto an integer type. whereas in C its legal (I believe by using a union).
Nope! Same. BTW the memcpy is always optimized out, so you might as well; then it's totes portable, C or C++. You can always build up an object (int, float, struct) from char values. Technically they are supposed to have come from another value of the same type, but the compiler can't really tell where the bytes came from, so has to assume they must have come from there.
C does have a thing where a void pointer can be copied into any other pointer, without a cast, and operations with the typed pointer are OK. Technically, the pointer value is supposed to have been copied to the void pointer from the same type, but realloc has to work. Note, realloc does give you alignment guarantees you don't get from pointers to random things.
Nothing else is portable. Unions, other pointer casting, all is UB unless your compiler extends the definition. So, Gcc has an extension to make unions, used in a certain way, well-defined. Probably Clang copies that. You would have to look up exactly what. But the memcpy hack works in all compilers. You won't use it often enough for the extra verbosity to be a problem.
In C you're free to get the address of the float (presuming it's not in a register) and cast that address to a pointer to an array of char, which doesn't copy anything, just re-decrees, from that point on, to the code generator of the compiler what assembly code is generated.
Right, I should not have said you have to memcpy. You can poke bytes any way you like, in any order you like. But! memcpy is known to compiler optimizers, so is more reliably optimized away.