I think it’s worth emphasizing that the C spec’s love of undefined behavior—if y...

armchairhacker · on Dec 24, 2023

> If you read the essay, it says “worse is better” means (paraphrasing) it’s more important that C compilers be easy to implement than easy to use. Also, it is more important that the implementation—or the design of the implementation—of a piece of software be simple than that it be correct. It is more important that it be simple than that it be consistent. It is more important that it be simple than that it be “complete” (for example, handle edge cases; it just needs to work in “most situations”).

This is really ironic considering C++, whose compilers may literally be the most complicated compilers to ever exist, and none of them implement the full (C++23) language spec.

What kind of maxim does C++ go by?

eschaton · on Dec 24, 2023

1. If you don’t use a language feature in your code, your code shouldn’t pay the cost—no matter what else might be helped, enabled or prevented by that. To this day compilers let entire features like exceptions and RTTI be disabled entirely rather than accept that there might possibly be some costs worth imposing.

2. Binary compatibility and dynamic linking don’t matter at all, and shouldn’t ever be considered in design or revision of anything language-related.

3. All areas of computer science and engineering are open to being a part of the standard library and existing language and standard library authors are the best judges of what should be in the standard and how it should be implemented than anyone else, including platform vendors or subject matter experts.

fulafel · on Dec 24, 2023

Zero cost abstractions & backward compatibility over implementation complexity, user friendliness or safety.

zozbot234 · on Dec 23, 2023

It's not like PL/I didn't allow for similar flaws, so "Worse is better" is neither here nor there. At least C was simple enough to learn for most programmers. It's still generally the case that a simple approach is more likely to be correct than an overly complicated one for which it's not even clear what "correct" means.

musicale · on Dec 24, 2023

> It's not like PL/I didn't allow for similar flaws, so "Worse is better" is neither here nor there.

PL/I is one of the reasons that Multics was not plagued by buffer overflows:

"The net result is that a PL/I programmer would have to work very hard to program a buffer overflow error, while a C programmer has to work very hard to avoid programming a buffer overflow error." [1]

[1] P. Karger, R. Schell, "Thirty Years Later: Lessons from the Multics Security Evaluation"

wavemode · on Dec 24, 2023

Software was significantly simpler in those days. I don't find it strange that they took a simplified view of software engineering. Specifically because those exact same simplified views still exist today - talk to people who've never worked on large complex systems, and you will usually encounter similar "anti-perfectionism".

People adopt simplified engineering practices when working on simple software by themselves, compared to when working on complex software within a large team.

trimistermota · on Dec 24, 2023

One of the reasons C became popular because the compiler was actually reasonable to use on the hardware of the time.

stjohnswarts · on Dec 24, 2023

Yeah it's hard for a lot of programmers to context switch back to when a megabyte a HUGE amount of memory (or even 640kB). I think everyone should play around on arduino uno or even smaller memory cpus to get some perspective. You didn't have gigabytes of rams to analyze all corner cases, data races, etc. during compilations.

moody__ · on Dec 24, 2023

>it is more important that the implementation—or the design of the implementation—of a piece of software be simple than that it be correct.

This is true and when your compiler actually abides by these values it is shocking how many issues just go away. The problem is that gcc and clang are nowhere near a simple implementation, asking the question of how exactly gcc or clang arrived at some given assembly for some given input is a really complicated answer. So much so that it makes me say that the relationship between the programmer and the compiler becomes adversarial. It would be one thing to just have very complicated optimizations but the whole "undefined behavior lets us to anything" approach is what makes it unbearable.

People can (and do) point at the C spec for fault of this and it is true that if the C spec was more strict then these compilers would not have the free pass to do these crazy miscompilations. However there is nothing stopping these compilers from just not doing that, there is nothing stopping them from just defining their own sane behavior for what the C spec defines as undefined behavior. It is no longer the case where we have a dozen or so C compilers that people need to target with their programs, the spec is not the bottom line anymore.

The Plan 9 compilers are the perfect example of getting this right. They define sane behavior that a reasonable programmer would expect for what the C spec calls undefined behavior. They are not gargantuan, their optimizations are not crazy. It is generally easy to understand how the compiler ended up with the assembly you see in the binary. Yet they are competent enough to selfhost an entire OS. The insane complexity of these other C compilers is simply not mandatory. They are not perfect of course it is still possible to write bad code but the result is no longer pathological, which is a giant help when you're actually trying to figure out what is going wrong.

nikic · on Dec 24, 2023

> People can (and do) point at the C spec for fault of this and it is true that if the C spec was more strict then these compilers would not have the free pass to do these crazy miscompilations. However there is nothing stopping these compilers from just not doing that, there is nothing stopping them from just defining their own sane behavior for what the C spec defines as undefined behavior. It is no longer the case where we have a dozen or so C compilers that people need to target with their programs, the spec is not the bottom line anymore.

Some compilers (like gcc and clang) do give you the option to make most undefined behaviors well-defined, using flags like -fwrapv, -fno-delete-null-pointer-checks, -fno-strict-aliasing, etc.

The key thing to understand is that if you use these flags, you opt-in to a non-standard C dialect. Going forward, you are no longer writing C code. Your code is now non-portable. Your code can no longer be compiled by a compiler that does not support these special C dialects.

Using these C dialects is a perfectly sensible thing to do, as long as you understand that this is what you are doing.

petergeoghegan · on Dec 24, 2023

> Your code can no longer be compiled by a compiler that does not support these special C dialects.

What compiler might that be?

It's true that MSVC doesn't have an equivalent of -fno-strict-aliasing, but that's because it just doesn't apply optimizations that assume strict aliasing in the first place. Admittedly the picture around signed overflow is more complicated, but it's essentially the same story.

> Using these C dialects is a perfectly sensible thing to do, as long as you understand that this is what you are doing.

I suppose that they technically are C dialects, but it seem more than a bit absurd to put it like that -- at least to me. By that standard the Linux kernel isn't written in C. And Firefox isn't written in C++. Postgres also wouldn't count as according-to-hoyle C under your definition. Even though Postgres compiles when -fwrapv and -fno-strict-aliasing are removed, and still passes all tests.

The implication of what you're saying seems to be that all of these open source projects each independently decided to "go there own way". I find it far more natural to explain the situation as one of GCC diverging when it decided to make -fstrict-aliasing the default around 15 years ago.

nikic · on Dec 24, 2023

> It's true that MSVC doesn't have an equivalent of -fno-strict-aliasing, but that's because it just doesn't apply optimizations that assume strict aliasing in the first place. Admittedly the picture around signed overflow is more complicated, but it's essentially the same story.

Of course it's fine if the compiler doesn't implement this as an option, but rather as the default behavior -- as long as this is actually a documented guarantee, rather than "we just haven't implemented those optimizations yet, we might start exploiting this UB at any point in the future". I'm not familiar with what guarantees MSVC documents in this area.

> By that standard the Linux kernel isn't written in C.

I think that in a very real sense, it isn't. The kernel is written in standard C plus a few hundred GCC extensions, of which additional compiler options are but a small part. It took very extensive effort to make the kernel compile with clang (which already was mostly gcc-compatible, but nowhere near enough for the kernel's level of extension use).

> Even though Postgres compiles when -fwrapv and -fno-strict-aliasing are removed, and still passes all tests.

I think there is a lot of difference between "requires signed integer overflow to be well-defined because we explicitly rely on it" and "disable optimization based on signed integer overflow as a hardening measure, in case we got this wrong somewhere".

> The implication of what you're saying seems to be that all of these open source projects each independently decided to "go there own way".

You kind of make it sound like use of these options is common. To the best of my knowledge, this is not the case, and these projects are rare exceptions, not the rule. So, yes, they decided to go their own way (which is perfectly fine.)

Kranar · on Dec 24, 2023

>The key thing to understand is that if you use these flags, you opt-in to a non-standard C dialect.

This is not true. The C Standard explicitly states that a conforming implementation is welcome to provide well defined semantics to behavior that the standard states is undefined.

The whole point of undefined behavior is that the C Standard imposes no requirement on implementations about the semantics of that program, so if a specific implementation adds some kind of checks or provides some sort of deterministic behavior to something that is otherwise undefined, the program itself is still C code and adheres to the C Standard.

>Your code can no longer be compiled by a compiler that does not support these special C dialects.

Undefined behavior is a semantic property of a program, not a syntactic property, so any other conforming C compiler will have no problem compiling it.

trissylegs · on Dec 24, 2023

For example Linux uses -fno-delete-null-pointer-checks

Buttons840 · on Dec 24, 2023

> The key thing to understand is that if you use these flags, you opt-in to a non-standard C dialect.

Isn't all C with undefined behavior non-standard. Or is there a standard for undefined behavior (obviously not, I would think)?

I don't understand why those flags wouldn't be turned on by default. Or do they affect more than just undefined behavior?

stjohnswarts · on Dec 24, 2023

They affect performance, forcing the dev to insure they aren't doing unsafe/ill-defined things which is almost impossible on huge code bases in c/c++

Buttons840 · on Dec 24, 2023

They are unhappy being forced to write code that is free of undefined behavior?

It reminds me of the joke: Alice claims she is very fast at mental math. Bob asks "what's 4821 times 5997?" Alice replies "ten thousand." Bob says, "What, no, that's wrong, very wrong." Alice says, "But it was fast!"

Are you telling me C / C++ developers are like Alice? When given the choice between fast undefined behavior and slower but more likely to be correct undefined behavior, developers will choose the faster option that is more likely to be incorrect?

stjohnswarts · on Dec 28, 2023

You seem to be assuming that on a large code base that you have top 1% programmers. I suppose that is the case for some places that are paying $500k average pay, but most businesses don't pay that and have average developers and a few top 5% if big enough. That's why tools that check for issues like linters and sanitizers help, especially if you make them part of "the process" for checkin

kazinator · on Dec 24, 2023

> piece of software be simple than that it be correct

Here is the thing: which of these is more plausible or at least less far-fetched?

A. Write a program that is not correct now, but will eventually be.

B. Write a program that is complicated now, but will eventually be simpler.

:)

Simplicity isn't something you can leave out now and add later, yet correctness can often be treated that way. Even a shop that values correctness above all else still does debugging. (If not code, then debugging their proofs, and debugging whether their formal specification actually implement the functional requirements).

the_gipsy · on Dec 24, 2023

If you simplify a correct program, it will most likely still be correct. If you "correct" a simple program, it probably becomes complicated.

kazinator · on Dec 24, 2023

If you algebraically simplify a correct program it will be correct.

But the simplicity which is at play in the context of "worse is better" is simplicity of requirement specifications themselves, before any code is written. (I think I'm the one who muddied the waters here by insinuating that correctness is a matter of debugging an incorrect program against a correct specification.)

In projects that value simplicity over correctness, what that means is that what is considered correct (as in the requirement we shall implement) is the simpler requirements, which are regarded as incorrect by other projects.

Programs that implement the complex requirements are vanishingly improbable to be simplified into programs that implement simple requirements, simply because those are breaking changes.

E.g. you can't take a database engine that provides certain consistency guarantees and make it have weaker guarantees in the next version (for the sake of simplicity), without breaking all the applications that depend on the current guarantees.

Correctness can be added --- including at the requirements level, not only debugging. It can be because it's often backwards compatible. E.g. adding handling for cases that were previously ignored.

moody__ · on Dec 24, 2023

You add in simplicity by taking out code. It is very possible (and common) to find better abstractions or methods to approach a problem that reduce the complexity of the code.

kazinator · on Dec 24, 2023

It is vanishingly uncommon to simplify an entire application, so that it goes from something complex to something whose number one value is simplicity, such that the simplicity is reflected in the actual functional specification.

spease · on Dec 24, 2023

How do the Plan 9 compilers compare to gcc/clang when it comes to performance or portability?

kazinator · on Dec 24, 2023

The main problem is this: the undefined status of one construct A in the program changes the behavior of a different, independent construct B in the program, even in cases when, say, B executes first and is correct. Everything is jumbled together in the optimizer, which establishes logical ties between the pieces that are unrelated to the network of intent.

If the undefined behavior of construct A causes just A to misbehave, we are still in sane waters.

kaba0 · on Dec 24, 2023

With all due respect, these “adversial compiler” expression just makes zero sense, and takes a lot away from your comment.

Guess what, the world is complex, and software has no bound for complexity. Which is better, a multi-million lines compiler that hundreds of people worked on for decades, or a toy one in a couple thousand lines written by a single programmer? What if the former can create 2-10x faster code than the latter (I probably even underestimate it, loop unswitching, vectorization, etc. can account for even more differences).

It turns out that we can build abstractions on top of abstractions, and if it’s designed well, it will scale with complexity (which we require). Would you change back to an OS that didn’t handle multithreading as it’s too complicated? Or that wouldn’t use GPUs?

moody__ · on Dec 24, 2023

>With all due respect, these “adversial compiler” expression just makes zero sense, and takes a lot away from your comment.

I was describing how it seems these complex compilers look for excuses to give you miscompilations, this "looking for gotcha's" makes the relationship appear adversarial to me.

>Guess what, the world is complex, and software has no bound for complexity.

My argument is that it really ought to. I think there is a diminishing return.

>What if the former can create 2-10x faster code than the latter (I probably even underestimate it, loop unswitching, vectorization, etc. can account for even more differences).

In code I've used with both compilers this has not been the case. Good profiling tools and manual hotspot optimization go a really long way.

>It turns out that we can build abstractions on top of abstractions, and if it’s designed well, it will scale with complexity

GCC has not scaled with complexity well enough, that's why we have people who lament its behavior.

>Would you change back to an OS that didn’t handle multithreading as it’s too complicated?

Concurrent programming makes problems easier, not harder in my experience. For what it's worth Plan 9 has excellent concurrent programming facilities.

>Or that wouldn’t use GPUs?

GPUs these days only work on systems in large part due to graces of the vendors themselves. Would nvidia or AMD be usable (enough) on Linux if there was not dedicated people from those companies working on drivers?

mattpallissard · on Dec 24, 2023

> It was a very different world, and just being able to write in a higher-level language than assembly on a particular computer was a big deal.

It was different then, but not by too much. Our tooling, languages, etc, has improved yes, but the underlying hardware still exists and occasionally the general solution provided by our improved tooling is sub optimal for our use case.

I actually _like_ where C has wound up. I have way better static analysis tooling at my disposal than I had 15 years ago and I can use a "memory safe" language like python or OCaml and say "hey, I know better here", think carefully, and push the memory safe compiler out of my way.

trimistermota · on Dec 24, 2023

One of the reasons C became popular because the compiler was actually reasonable to use on the hardware of the time.

fulafel · on Dec 24, 2023

"Worse is better" is much older then "undefined behaviour". Undefined behaviour was invented for C standardization, when C was already mature and had been out of the Unix childhood home for a long time.

For example, many UB semantics in the standard come from allowing for C ports for strange non-unix hardware, and making bold decisions when developments in compiler optimization state of the art ran into underspecified corners of C semantics.

jholman · on Dec 24, 2023

"Worse Is Better", as a description of a set of engineering values embodied by C, is from 1989.

I'm taking your word for it that UB was invented for C standardization (I have no knowledge of that history, and your claim seems plausible), and I'm going to say that probably they didn't invent it in the last year of the seven-year standardization effort, so probably UB is older than Worse Is Better.

nullc · on Dec 23, 2023

uh. No. Rust unsafe gives rust behavior a lot like C. If you at all break the rather subtle rules, then essentially anything can and will happen.

So for example, there was recently a thread where someone had code that checked if a value was in range to safely coerce it directly to an enum then did so. But because of eager evaluation of an argument the unsafe cast happened first. From this the compiler reasoned that the variable was preconditionally range constrained to always be in range and it optimized out the in-range test (which itself was not unsafe code).

This is a classic C bug where someone implements an overflow check that itself can overflow, causing the branch for overflow to get optimized out. But at least in C the simpler syntax at least made it clear that the triggering code got executed first. The more complex rust syntax obscured that.

Rust has improved the situation by narrowing the cases where you can get into this trouble, but on the other hand it adds a lot of other complexity that contributes to faulty code (and a nearly mandatory packaging ecosystem which is a security nightmare-- it's the norm for even simple rust utilities to pull in a million lines of unauditable (just by bulk) third party code, including multiple HTTPS libraries).

As a result, I don't think it can be taken for granted that rust as a whole is an advancement in software integrity-- it may be, but it's something that ought to be formally studied. In some cases rust might be replacing memory safety bugs with an even greater number of other defects which, depending on the application, may be worse. (not everything is an internet exposed service where hacks are the only failure of consequence and where input really should be assumed to be intelligently adversarial.)

In any case, "break the rules and all bets are off" is an issue that likely will continue to exist in any performant language. Automatic code generation will generate stuff with awful performance unless an optimizer goes through and eliminates 'impossible cases', but optimization isn't possible unless the compiler can assume the rules are followed.

Ygg2 · on Dec 24, 2023

> If you at all break the rather subtle rules, then essentially anything can and will happen.

If by subtle rules you mean your invariants, that is missing fundamental assumptions.

It's akin to making a building without foundation and load bearing structures.

> So for example, there was recently a thread where someone had code that checked if a value was in range to safely coerce it directly to an enum then did so. But because of eager evaluation of an argument the unsafe cast happened first

You mean this: https://notgull.net/cautionary-unsafe-tale/

However note the UB goes away if you never use any unsafe code. Or if you expand your unsafe to encompas some safe code.

Issue it had was safe code was invalidating the invariants unsafe code was relying on. Iirc alignment.

lmm · on Dec 24, 2023

> If by subtle rules you mean your invariants, that is missing fundamental assumptions.

> It's akin to making a building without foundation and load bearing structures.

That's exactly how C approaches UB too.

> However note the UB goes away if you never use any unsafe code. Or if you expand your unsafe to encompas some safe code.

Right. The problem is that's untenable; any nontrivial program will have unsafe somewhere, and unsafe can cause failures arbitrarily far away from the incorrect code. The whole point of the article is that you need to be able to draw an actual boundary between the unsafe parts and the safe parts of your program and review the unsafe parts in isolation. If Rust doesn't give you more and better support in doing that than C does, then it's not really making a difference.

Animats · on Dec 24, 2023

The problem is that's untenable; any nontrivial program will have unsafe somewhere

I'm up to 47,000 lines of Rust with no "unsafe". The main program starts with

#![forbid(unsafe_code)]

and that applies to the entire program, although not external crates. If you're not using foreign functions, you don't need "unsafe".

Some published crates I use do have "unsafe", more than they should. This is annoying.

couchand · on Dec 24, 2023

> If Rust doesn't give you more and better support in doing that than C does, then it's not really making a difference.

It does and if you've been missing that you've misunderstood this whole discussion. Rust allows you to wrap up some unsafe code in a safe abstraction. You use the type checker to enforce your invariants, such that the unsafe code can be reviewed in isolation.

Rust gives you exactly what you're saying you want. Steve is describing Rust as it is.

lmm · on Dec 24, 2023

> It does and if you've been missing that you've misunderstood this whole discussion.

Don't tell it to me, tell it to the person I was replying to.

couchand · on Dec 24, 2023

I believe you said

> any nontrivial program will have unsafe somewhere, and unsafe can cause failures arbitrarily far away from the incorrect code

But the point I was trying to make is that this claim is facially true but misses the critical context: in idiomatic Rust, the unsafe keyword is not the barrier that enforces integrity of the system.

The integrity of the system is enforced by the type checker, the same way it always is. The unsafe annotation alerts the reader that there exists an invariant the compiler can't check. Idiomatic code will have a SAFETY comment above describing the invariant. It should be locally possible to reason about how the type abstraction used to encapaulate truly enforces that invariant.

This is what Rust people are talking about when we say wrap unsafe code in a safe abstraction. If we do it right, it's no longer possible for spooky action at a distance.

Now you might go and point out the fallibility of humans and all that, and you'd be right, but that's in fact what makes it so valuable to try.

nullc · on Dec 24, 2023

Exactly. All-hope-lost behavior is possible in rust code unless there is no unsafe anywhere (and no compiler bugs, but I think its fair to ignore those when discussing the language in the abstract).

Rust potentially benefits from fewer opportunities to footgun yourself, but rust also comes with other costs (including a more complex syntax, a bad dependency culture, a lot more front-loaded cognitive load around lifetime management, etc.) which might offset those benefits. Some of those extra complexity costs seem hard (or impossible) to avoid when trying to keep memory safety from having a disproportional runtime cost, so I'm not necessarily faulting rust. But some of the sources of defects in rust code may also be entirely avoidable, which is why I think it's important to actually study it rather than axiomatically assume its behavior makes programs correct. It doesn't, even absent unsafe.

torstenvl · on Dec 24, 2023

> compiler bugs... its fair to ignore those when discussing the language in the abstract

When the language is defined as whatever the compiler does, as it is in the case of Rust, I'm not so sure.

If there were a Rust standard with multiple compliant compilers, I'd be more convinced, but Rust isn't there yet.

(And TRF's trademark policy may well prevent it from ever getting there.)

saagarjha · on Dec 24, 2023

Having more implementations doesn’t really help you avoid bugs in the one you’re using.

nullc · on Dec 24, 2023

Not entirely true. When there are multiple implementations disagreements between them in normative behavior prove one or the other (or the language itself) has a bug. This means you can run randomized tests in your software, hash the results, and the hash should be the same between compilers (and across platforms).

That's an option that doesn't exist when there aren't multiple, and even if you won't use the potential personally other people using it will eliminate bugs in the compiler. The csmith program was used to find an incredible amount of bugs in gcc and clang. This approach can also be applied to many pieces of ordinary code too.

Prior poster's point is also that when there is only one the language is the compiler and it's not useful to talk about them in isolation. I don't fully agree with that in theory since the language is supposed to be stable-ish while bugs in the compiler will get fixed. So to the extent that something in rust is a footgun due to the language we might be stuck with it, but if its due to a compiler bug it will probably be fixed.

Ygg2 · on Dec 24, 2023

    > That's exactly how C approaches UB too.

In theory yes. In practice it's been proven that's not what happens.

    > The problem is that's untenable; any nontrivial program will have unsafe somewhere, and unsafe can cause failures arbitrarily far away from the incorrect code.

Not really. Ignoring any and all non-compiler tooling, UB in code can be traced to: A) unsafe region in code where its invariants are invalidated B) if no unsafe code exists in project, then an unsoundness bug in the Rust standard lib.

Safe code needs to be safe for any possible value of arguments, fields, variables. Doing otherwise is unsoundness bug.

In C, the code you need to look is essentially ALL of your code + C std lib. In Rust, you can focus on places where safe and unsafe code mingle. And IFF (if and only iff) your code contains 0 unsafe, can you be sure there is a Rust bug.

This however assumes any code.

Idiomatic code in Rust will treat unsafe with big large friendly letters `SAFETY: this works while X and Y hold`. How do you do this is C? Since most lines are possible source of UB do you annotate every line of code? I assume not. How do you ensure something will not be mutated? How do you ensure something is thread-safe? Do you use type bounds? And if not how?

     > If Rust doesn't give you more and better support in doing that than C does, then it's not really making a difference.

Also that's the Nirvana fallacy, Rust doesn't have to be perfect and prevent EVERY undefined behavior forever. If it's improvement over the niche it's targeting (C/C++) then it's an improvement. I.e. is it harder to find UB in Rust or in C?

Like you don't go around and say, "Well seatbelts don't prevent head injuries when you hit a wall head-on. So remove seatbelts!", you add airbags to cushion the blow.

And spoiler alert: It's giving you the airbags as well. The normal thing when encountering a UB isn't to try to track it down it's to run `miri`, it found UB as soon as it ran.

o11c · on Dec 24, 2023

Particularly notable: due to the restrictions that Rust chooses to enforce on safe code, `unsafe` is forced in a lot of situations where other languages maintain full runtime safety. I'm not saying going full-Java is the only answer, but a language that is safer than Rust is certainly possible.

tines · on Dec 24, 2023

Like what exactly?

bigstrat2003 · on Dec 24, 2023

> Issue it had was safe code was invalidating the invariants unsafe code was relying on. Iirc alignment.

You do recall correctly. The safe code was producing a pointer with alignment that was off, and the unsafe code dereferenced it without checking. I felt that it was kind of a bad take when several people said that it was UB caused by safe code, because really the issue was that the unsafe code wasn't doing its job. The Rust unsafe model is that you can't trust safe code when inside unsafe code. It's on the unsafe code to uphold invariants which it requires, not the safe code which calls it

Ar-Curunir · on Dec 23, 2023

> As a result, I don't think it can be taken for granted that rust as a whole is an advancement in software integrity-- it may be, but it's something that ought to be formally studied. In some cases rust might be replacing memory safety bugs with an even greater number of other defects which, depending on the application, may be worse.

I’m sorry, but without any supporting evidence for this claim, this is just FUD. Everything that we’ve seen in case studies of people reimplementing stuff in Rust indicates that memory safety and logic bugs are derived compared to something like C.

nullc · on Dec 24, 2023

Citation welcome to those case studies, because I've not seen them. It's on the advocates of rust to establish that it makes things better because it absolutely isn't unambiguous.

We really seem to be in the stone age in terms of what practices lead to higher quality software. We still have people who chant "goto harmful" against one simply forward jumps to on-error-cleanup code, yet still litter their C++ and java with exceptions which are a less safe and less clear version of the same thing.

I've personally found the rate of embarrassing errors in simple rust software is increased over comparable C code, but I freely admit that this experience is far from a formal study and may well be due to the lack of problem-domain competence or general haste in people participating in culty "re-implement in rust" exercises (and where the rust code is far more often someone's "learn rust" project). And at least where security w/ untrusted input is a concern the nature of the rust bugs is preferable to the bugs in comparable C code, but as mentioned that doesn't apply to a lot of software.

Another data-point is that the vast majority of firefox crashes I experience now are rust panics, even though the amount of rust code is small compared to C++ code. It's hard to reason from that however, since it can be said that the rust code is more complex and more heavily used than the bulk of the rest.

klabb3 · on Dec 24, 2023

I appreciate your comments in this thread. They are a healthy skeptical technology-neutral take unaffected by novel-paradigm-dogma – in this case about Rust, but it really applies to any software practice. I have nothing against Rust and I believe it has advanced the space of mainstream imperative languages significantly, particular in high performance low-level coding. But you have to always keep an eye on the horizon.

> We really seem to be in the stone age in terms of what practices lead to higher quality software.

I agree. What I always come back to is simplicity, or reducing total cognitive complexity. The only way I know of that actually consistently works, is modularization (in the most liberal sense of the word), in order for our human brains to work in a reduced problem domain at any given point in time. So whatever languages, protocols, tooling and design patterns help with that, I lean into, although this is (currently) a subjective measure.

On some problems, it’s clear Rust is amazing in this respect, taking away certain worries so that you can focus on “what matters”. But this is not a truism in all domains, because the issues that Rust addresses are merely a good list, not an exhaustive one.

jiggawatts · on Dec 24, 2023

> the vast majority of firefox crashes I experience now are rust panics

Have you considered that this might be precisely because Rust is catching and stopping something bad with a safe panic? And that the same programmer making the same mistake in C/C++ would have resulted in silent data corruption, a security vulnerability, and possibly a portal into the space between dimensions from which optimising compilers will happily allow the Others into our world because -- and I quote -- "that's allowed by the spec"?

I often see in the news some panic about a sudden rise in some rare disease, but it almost always turns out that the increase is just due to an improvement in detection, not a change in the actual prevalence.

I suspect that silent data corruption vs liberal use of asserts that trigger visible panics is in the same category.

Someone in this thread was complaining that they think exceptions are "dangerous" and gotos are "safe". Their thinking is probably coloured by endless stack traces from exceptions in managed languages, comparing that to the "oops I stepped over a mandatory cleanup using goto" in their own C code, which they probably won't even notice... because it's probably just silently leaking memory. Or file handles. Or threads. But not loudly and in your face!

nullc · on Dec 24, 2023

> Have you considered that this might be precisely because Rust is catching and stopping something bad with a safe panic

Absolutely! But I counter: How do you know that all of them are? Or that even many of them are?

The developers that work on it may well know, but it isn't something we can axiomatically assume. It is not a true statement that any rust panic would have been an error in C(++) code written by the same (competent in both) developer.

Increased error visibility can be either more errors or more error sensitivity. It's worth being mindful of this because otherwise we may adopt programming practices that increase the overall defect rate, and think we're making things better.

Perhaps it's easier to see with asserts. If your defect rate is 1 defect per 1000 lines of code, and you go add 1000 asserts you'll add a bug (actually I think you'll add 10: the boundary conditions that asserts test are far more likely than average code to be implemented wrong). Like any other code they could also start off correct but then become desynced with the rest of the code. If you go and ship those asserts in production and the effect of asserting is important in your application, then unless the asserts prevent more bugs than they created you made the program worse off. Even if the asserts don't cause bugs they may make the program harder to maintain or extend. Therefore in any codebase there must be some optimal level of asserts, more or less and the program is worse off.

Is the implicit assertion of rust's behavior optimal? Probably not for all codes, because different codes call for different tradeoffs. Or even for every code there might be language changes that could make things better. But we'll never find out this stuff if we can't admit that there is the potential for uncertainty or improvement and confuse memory safety for program correctness. It's one of the more important aspects of correctness but the world is full of serious bugs in software written in inherently memory safe languages.

And after all, a program consisting of nothing but exit(1) (or perhaps while(;;){}) is the most perfectly memory safe program possible. :P If memory safety were the only goal programming would be much simpler.

> and possibly a portal into the space between dimensions from which optimising compilers will happily allow the Others into our world

As mentioned up thread, rust also can create portals into the hell dimension if there is unsafe code (including in the standard library) that fails to obey all the not-very-simple contract requirements. Also the number of times errant C code has actually opened a portal in to the hell dimension thereby destroying the universe is greatly overstated, that's only happened twice at most, usually undefined behavior is just a crash or an exploitable misbehavior that lets some haxor steal your data and those can also happen from pure logic errors (or a panic, in the case of a crash). I do fully agree that narrowing the potential for UB misbehavior is very important and rust is an important advance there. But Rust doesn't just do that. It also has its own costs and quirks.

kaba0 · on Dec 24, 2023

How are exceptions less safe or less clear?

Also, crashing fast is the best way to deal with unforeseen events one can’t recover from. Your anecdotal experience with regards to firefox and rust just shows that they put more assertions into the code (good!), which makes it easier to notice and later fix bugs, as rust makes it mandatory to handle error cases to some degree. This is also a plus for rust.

nullc · on Dec 24, 2023

If you can't recover, perhaps. But when it causes more unrecoverable states (e.g. because handling the case requires a massive refactor to satisfy the borrow checker, so you just panic...) it's another issue.

Even ignoring that, not everything is a security sensitive internet application. In plenty of cases trundling on with a corrupt state in production is superior, particularly since many corrupt states are benign (particularly randomly occurring ones, rather than attacker produced ones). E.g. is it better for the software driving your Christmas lights to potentially glitch for a moment and display the wrong colors or to just shut down (/get stuck, depending on the design)?

> as rust makes it mandatory to handle error cases to some degree

"handle" often just means panic, and in cases where security isn't a concern and there isn't any persistent data to corrupt crashing may already be the worst thing that could happen. So you might have that the C-written program would have correctly handled the case but due to rust's effort front loading it just doesn't get handled. ... but even if the C-written program handled it wrong it would be worse than the rust code that panicked (for some applications) and quite possibly better.

I've experienced rust sometimes normalizes intentionally writing code that is effectively if()elif()elif()else{abort();} as a result of the a culture of panic being 'safe' and making other choices require more upfront effort.

For Firefox which is generally security critical I fully agree that panicing is better than being corrupt. But we'd be assuming facts not in evidence if we assume that the panics would actually be bad states if that code were written in C++, it's possible that some of them would have been correctly handled but for the extra effort rust required up front. In firefox the tradeoff for more defects in exchange for being safer is still probably a good one. But one can't be in denial of this possibility if one is to minimize the cost of rust, apply it to where it's most applicable, etc.

> How are exceptions less safe or less clear?

Potentially! It depends on the culture of their use. They have a highly non-local effect, so some particular weird exception happens in library of library code of a sort that is completely incomprehensible to the code 20 steps back up the stack. And not just incomprehensible, but unknowable since the called code could be changed later and add exceptions that the caller couldn't have known of. The control flow diverts away for an invisible and potentially unknowable reason. For that reason many places have varrious rules about exceptions, including sometimes prohibiting them entirely.

Of course it can be used well and safely too with care.

But so can "goto fail; ... fail:...". And code that could locally handle an error case by unwidindign itself and e.g. giving an empty result probably ought to do so rather than bubble up any one of thirteen different exception states that may be differently mishandled. (or at least the better choice between the approaches isn't something that can be correctly answered by a maxim or anything less than case specific judgement)

jules · on Dec 24, 2023

Undefined behavior is critical for performance. Without undefined behavior, C compilers would not be able to optimize at all. You'd be running everything at -O0 or worse.

nindalf · on Dec 24, 2023

> Undefined behavior is critical for performance

Not only is this not true, it's trivially easy to prove it's not true. Both rustc and clang generate LLVM IR and use LLVM to optimise and generate machine code. The code that's generated is equally performant, as you'd expect since most of the optimisation is being done by LLVM, not the front end.

The difference between the two frontends is that rustc is stricter, rejecting programs where UB may arise.

jules · on Dec 25, 2023

I do agree that if you remove `unsafe` blocks from Rust, then you have a performant language without UB. However: (1) We are talking about C here, not Rust (2) Rust has UB due to `unsafe` (3) LLVM IR has UB.

wavemode · on Dec 24, 2023

To play devil's advocate, that doesn't negate the claim that undefined behavior is critical for performance. Rust (as well as LLVM IR itself) also have a concept of undefined behavior.

nikic · on Dec 24, 2023

The main performance-critical undefined behavior in C is provenance. The rest can be removed without major performance impact. (Which is not to say they can't give you 10% on specific workloads, just they aren't what is taking you from -O0 to -O3.)

A related student poster from EuroLLVM 2023: https://llvm.org/devmtg/2023-05/slides/Posters/05-Popescu-Pe... It tests the performance impact of some of the secondary undefined behaviors, and the result is basically what you'd expect. They do have impact, but if you average over all benchmarks the improvement is, at best, in the low single digits.

jules · on Dec 25, 2023

Yes, I do agree that you basically only need provenance, and that C has more UB than necessary. You can indeed reduce UB without any major performance impact (e.g., shifts by a large n, signed overflow). I think that would be a good idea.

bigstrat2003 · on Dec 24, 2023

Even if that's true (which is not given, as others have said), it would be a worthy trade to make. Software needs to do what it's meant to do first and foremost. Speed doesn't mean shit if you can't trust that the software actually works.

lpapez · on Dec 24, 2023

But that is simply not true, and can be proved by looking at the world we live in.

C became the dominant language precisely due to hardware constraints, and the ability to extract every last drop from limited hardware was back in the day more important than software working perfectly always. If this wasn't the case, other safer alternatives would have been preferred.

Unless in very specific domains, hardware advances have outpaced the software needs (eg. there is only so much compute power a spreadsheet user will need). That is why today we allow ourselves to think about "luxuries of the past" such as corectness, safety, ergonomics, composability etc.

pjmlp · on Dec 24, 2023

C became dominat, because UNIX was a free beer OS, with source tapes and a book to come along for the ride (Lion's commentary).

Had UNIX been as expensive as VMS, or System/370, with a commercial license, no university would have cared to port UNIX, and focus on the systems language used to develop it (post-UNIX V5).

As for its performance myth, regarding 1980's C compilers.

"Oh, it was quite a while ago. I kind of stopped when C came out. That was a big blow. We were making so much good progress on optimizations and transformations. We were getting rid of just one nice problem after another. When C came out, at one of the SIGPLAN compiler conferences, there was a debate between Steve Johnson from Bell Labs, who was supporting C, and one of our people, Bill Harrison, who was working on a project that I had at that time supporting automatic optimization...The nubbin of the debate was Steve's defense of not having to build optimizers anymore because the programmer would take care of it. That it was really a programmer's issue.... Seibel: Do you think C is a reasonable language if they had restricted its use to operating-system kernels? Allen: Oh, yeah. That would have been fine. And, in fact, you need to have something like that, something where experts can really fine-tune without big bottlenecks because those are key problems to solve. By 1960, we had a long list of amazing languages: Lisp, APL, Fortran, COBOL, Algol 60. These are higher-level than C. We have seriously regressed, since C developed. C has destroyed our ability to advance the state of the art in automatic optimization, automatic parallelization, automatic mapping of a high-level language to the machine. This is one of the reasons compilers are ... basically not taught much anymore in the colleges and universities."

-- Fran Allen interview, Excerpted from: Peter Seibel. Coders at Work: Reflections on the Craft of Programming

kaba0 · on Dec 24, 2023

> [..] and the ability to extract every last drop from limited hardware was back in the day more important than software working perfectly always

That seems to contradict the “very simple to implement compiler”

jules · on Dec 25, 2023

What behavior would you define for

    *((int*)rand()) = 42

jiggawatts · on Dec 24, 2023

Undefined behaviour enables only a fairly small set of optimisations. There's a large set of optimisations that can be implemented completely safely without having to make such dangerous assumptions. Other programming languages do this all the time, it's not just C/C++ that have optimisers!

jules · on Dec 25, 2023

This is simply not true. Almost all optimisations rely on undefined behavior, see https://news.ycombinator.com/item?id=38760475

Note that I am not talking about UB like signed integer overflow. Removing that would slow down programs by a couple of percent. The important type of UB is pointer provenance. This ensures that e.g., writing to a random memory address is UB.

jiggawatts · on Dec 25, 2023

Almost all optimisations in C/C++ compilers depend on undefined behaviour, because practically no behaviour is defined!

The trick is to define behaviour, which is what other programming languages do.

E.g.: in both C# and Rust, integers have fixed sizes. A C# Int32 is equivalent to a Rust i32. Only God knows what a C/C++ "int" is. It could have 17 bits and use ternary.

jules · on Dec 25, 2023

As I said, this isn't about ints, it's about pointers. What behavior will you define in C for writing to a random memory address?

renox · on Dec 24, 2023

Where's your proof?

Rust has far fewer UB than C yet its performance is comparable to C.

nindalf · on Dec 24, 2023

Just to add, Safe Rust should have zero UB, modulo bugs in the compiler. And they're very serious about fixing such bugs. They won't dismiss it with "just be careful while programming".

jules · on Dec 25, 2023

Sure, safe Rust has no UB. The main reason why that doesn't apply to C is that if you wanted to make every C program have a defined behavior, then you also need to define behavior for out of bounds memory writes, including writes from other threads. This means that the compiler basically cannot apply any optimisations, because another thread could be overwriting your data structures at any moment.

yxhuvud · on Dec 24, 2023

Then fence it in and have it be as little of it as possible and as obvious as possible when it can happen.

quelsolaar · on Dec 23, 2023

C gives you a level of control and responsibility not found in other languages. That's a choice, not something that is inherently worse. It may be worse for what you are doing. Most people don't value the level of control that C gives you and would rather chose another language and that is fine. But having a language available with this level of control is valuable, even if few people chose to use it. Most UB in the C standard is there for a very good reason.

kaba0 · on Dec 24, 2023

C is not a particularly low-level. It has no (standard) way to control vectorization, stack usage, calling conventions, etc.

It just had an insane about of money spent on making its compilers optimize better.

quelsolaar · on Dec 24, 2023

You are right, but no language has tried to claim the space of giving the user more control, so C remains the lowest level language we have that is portable. I think there are lots of opportunities in this space, but i don't know of anyone working on it.

kaba0 · on Dec 24, 2023

C++, Rust, Zig all are lower level, due to having control over vectors.

pjmlp · on Dec 24, 2023

Whatever people it is C for low level coding, in reality are compiler specific language extensions, not available on ISO C.

All languages can have such specific extensions and many do.