HeroicKatora's comments

HeroicKatora · on Sept 2, 2024

I couldn't quite replicate those numbers (rustc 1.78, gcc 14, g++ 14) with a recent state. On my machine (Ryzen 9 7900X, LVM on NVMe) it's rustc 60-80ms, gcc 20-30ms and tcc in 2ms. Intererestingly, g++ is still 200ms on that machine. Activating time and the builtin time-passes in rustc here's also an interesting observation: rustc spends 47ms of its time in sys and 23ms in user compared to <3ms for both C variants. It counts its own time as 50ms instead for some reason, not sure what it is subtracting here. Also looking at individual passes of the compiler (rustc +nightly -C opt-level=1 -Z time-passes gcd.rs) reveals it spends 33ms linking, 16ms in LLVM and only a negligible time in what you'd consider compiling.

I think the test is uultimately non-sensical for the question being posed here. It doesn't reveal anything insightful about scaling to real world program sizes, either. The time of rustc is dominated by the platform linker anyways. Sure, one might argue that this points out Rust as relying too much on the linker and creating too many unused symbols. But the question of whether this is caused by the language and in particular its syntactical choices .. should at that point be answered with probably not. It's not a benchmark you want to compare by percentage speedups anyways since it's probably dominated by constant time costs for any of the batteries included standard library languages compared to C.

kragen · on Sept 2, 2024

thank you very much for the failed replication!

it's interesting, my machine is fairly similar—ryzen 5 3500u, rustc 1.63.0, luks on nvme. is it possible that rustc has gotten much faster since 1.63?

while i agree that it's not the most important test for day-to-day use, i don't agree that it falls to the level of nonsensical. how fast things are determines how you can use them. tcc and old versions of gcc are fast enough that you could very reasonably generate a c file, compile it into a new shared object, dlopen it, and call it, every screen frame. there are some languages, like gforth, that actually implement their ffi in such a way, and sitkack and i have both done some experiments with inline c and jit compilation by this mechanism

i do agree that the syntactical choices of the language have relatively little to do with it, and your rustc measurements provide strong evidence of that—though perhaps it is somewhat unfavorable for c++ that it commonly has to tokenize many megabytes of header files and do the moral equivalent of text replacement to implement parametric polymorphism

HeroicKatora · on Sept 3, 2024

Thank you for re-validating the numbers on your end, it's indeed very possible. There's been quite a few improvements in those versions. Though the effect size does not quite fit with most of the optimizations I can recall, maybe it's much more related to optimizations to the standard library's size and linking behavior.

With regards to standard use, for many users the scenario is definitely not common. I'd rather rustc be an effective screw driver and a separate hammer be built than try to mangle both into the same tool. By that I mean, it's very clear which portion of the compiler must be repurposed here. The hard question is whether the architecture is amenable to alternative linker backends that serve your use-case. I'm afraid I can't answer that conclusively. Only so much, the conceptual conflict of Rust is that linkining is a very memory-safety critical part of the process. And with its compilation module model it relinks everything into the resulting binary / library which includes a large std and dependency tree even if much of this is removed by the step. Maybe that can be changed; and relying a tool whose interface was ultimately designed with C in mind is also far from optimal to compute those outputs and inputs. It's hard to say how much of it stems from compatibility concerns and compatibility overheads and how much is fundamental to the language's design which could be shed for a pure build process.

With regards to C++, I suspect it's rooted in the fact that parsing it requires in principle the implementation of a complete consteval engine. The language has a dependency loop between parsing and codegen. This of course, is not how data should be laid out for executing fast programs on it. It's quite concerning given the specifications still contains the bold faced lie that "The disambiguation is purely syntactic" (6.8; 1) for typenames vs non-typenames to parse constructors from declarations which at the present can require arbitrary template specialization. It might be interesting to see if those two headers in your example already execute some of these dependency loops but it's hard for me to think of an experiment to validate any of this. Maybe you have ideas, is there something like time-passes?

kragen · on Sept 3, 2024

dunno. with respect to c++, you could probably hack together a c++ compiler setup that was more aggressive about using precompiled-header-like things. and if you're trying to abuse g++ as a jit, you could maybe write a small, self-contained header that the compiler can handle quickly, and not call any standard library functions from the generated code

HeroicKatora · on Aug 4, 2024

That statement means the comittee does not want to stop it from being developed. The question is, has it? They mean a specific implementation could work as portable assembler, mirroring djb's request for an 'unsurprising' C compiler. Another interpretation would be in the context of CompCert, which has been developed to achieve semantic preservation between assembly and its source. Interestingly this of course hints at verifying an assembled snippet coming from some other source as well. Then that alternate source for the critical functions frees the rest of compiler internals from the problems of preserving constant-timeness and leakfreedom through their passes.

mpweiher · on Aug 5, 2024

No.

C already existed prior to the ANSI standardization process, so there was nothing "to be developed", though a few changes were made to the language, in particular function prototypes.

C was being used in this fashion, and the ANSI standards committee made it clear that it wanted the standard to maintain that use-case.

HeroicKatora · on Aug 4, 2024

These are aspiration statements, not a factual judgment of what that standard or its existing implementations actually are. At least they do not cover all implementations nor define precisely what they cover. Note the immediate next statement: "C code can be non-portable."

In my opinion, C has tried to serve two masters and they made a screw-hammer in the process.

The rest of the field has moved on significantly. We want portable behavior, not implementation-defined vomit that will leave you doubting whether porting introduces new UB paths that you haven't already fully checked against (by, e.g. varying the size of integers in such a way some promotion is changed to something leading to signed overflow; or bounds checking is ineffective).

The paragraph further down about explicitly and swiftly rejecting a validation test suite should also read as a warning. Not only would the proposal of modern software development without a test suite get you swiftly fired today, but they're explicitly acknowledging the insurmountable difficulties in producing any code with consistent cross-implementation behavior. But in the time since then, other languages have demonstrated you can reap many of the advantages of close-to-the-metal without compromising on behavior consistency in cross-target behavior, at least for many relevant real-word cases.

They really knew what they were building, a compromise. But that gets cherry-picked into absurdity such as stating C is portable in present-tense or that any inherent properties make it assembly-like. It's neither.

mpweiher · on Aug 5, 2024

These are statements of intent. And the intent is both stated explicitly and also very clear in the standard document that the use as a "portable assembler" is one of the use cases that is intended and that the language should not prohibit.

That does not mean that C is a portable assembly language to the exclusion of everything and anything else, but it also means the claim that it is definitely in no way a portable assembly language at all is also clearly false. Being a portable assembly (and "high level" for the time) is one of the intended use-cases.

> In my opinion, C has tried to serve two masters and they made a screw-hammer in the process.

Yes. The original intent for which it was designed and in which role it works well.

> The rest of the field has moved on significantly. We want portable behavior, not implementation-defined vomit that will leave you doubting whether porting introduces new UB paths that you haven't already fully checked against

Yes, that's the "other" direction that deviates from the original intent. In this role, it does not work well, because, as you rightly point out, all that UB/IB becomes a bug, not a feature.

For that role: pick another language. Because trying to retrofit C to not be the language it is just doesn't work. People have tried. And failed.

Of course what we have now is the worst of both worlds: instead of either (a) UB serving its original purpose of letting C be a fairly thin and mostly portable shell above the machine, or (b) eliminating UB in order to have stable semantics, compiler writers have chosen (c): exploiting UB for optimization.

Now these optimizations alter program behavior, sometimes drastically and even impacting safety (for example by eliminating bounds checks that the programmer explicitly put in!), despite the fact that the one cardinal rule of program optimization is that it must not alter program behavior (except for execution speed).

The completely schizophrenic "reasoning" for this altering of program behavior being somehow OK is that, at the same time that we are using UB to optimize all over the place, we are also free to assume that UB cannot and never does happen. This despite the fact that it is demonstrably untrue. After all UB is all over the C standard, and all over real world code. And used for optimization purposes, while not existing.

> They really knew what they were building, a compromise.

Exactly. And for the last 3 decades or so people have been trying unsuccessfully to unpick that compromise. And the result is awful.

The interests driving this are also pretty clear. On the one hand a few mega-corps for whom the tradeoff of making code inscrutable and unmanageable for The Rest of Us™ is completely worth it as long as it shaves off 0.02% running time in the code they run on tens or hundreds of data centers and I don't know how many machines. On the other hand, compiler researchers and/or open-source compiler engineers who are mostly financed by those few megacorps (the joy of open-source!) and for whom there is little else in terms of PhD-worthy or paid work to do outside of that constellation.

I used to pay for my C compiler, thus there was a vendor and I was their customer and they had a strong interest in not pissing me off, because they depended on me and my ilk for their livelihood. This even pre-dated the first ANSI-C standard, so all the compiler's behavior was UB. They still didn't pull any of the shenanigans that current C compilers do.

HeroicKatora · on July 29, 2024

It could be read as a play on a particular movie monologue, intentional or not. Dr. Ford in Westworld Season 1, ep 10. Possibly adapted from prior stories.

> you [people; …] cannot change. Cause you're human afterall. […] So I began to compose a new story for them [aka. artificial intelligence]. It begins with […] the choices they will have to make; and the people they will decide to become.