Hacker Newsnew | past | comments | ask | show | jobs | submit | OskarS's commentslogin

I think personally the answer is "basically no", Rust, C and C++ are all the same kind of low-level languages with the same kind of compiler backends and optimizations, any performance thing you could do in one you can basically do in the other two.

However, in the spirit of the question: someone mentioned the stricter aliasing rules, that one does come to mind on Rust's side over C/C++. On the other hand, signed integer overflow being UB would count for C/C++ (in general: all the UB in C/C++ not present in Rust is there for performance reasons).

Another thing I thought of in Rust and C++s favor is generics. For instance, in C, qsort() takes a function pointer for the comparison function, in Rust and C++, the standard library sorting functions are templated on the comparison function. This means it's much easier for the compiler to specialize the sorting function, inline the comparisons and optimize around it. I don't know if C compilers specialize qsort() based on comparison function this way. They might, but it's certainly a lot more to ask of the compiler, and I would argue there are probably many cases like this where C++ and Rust can outperform C because of their much more powerful facilities for specialization.


This is a tangent, because it clearly didn’t pan out, but I had hope for rust having an edge when I learned about how all objects are known to be immutable or not. This means all the mutable objects can be held together, as well as the immutable, and we’d have more efficient use of the cache: memory writes to mutable objects share the cache with other mutable objects, not immutable Objects, and the bandwidth isn’t wasted on writing back bytes of immutable objects that will never change.

As I don’t see any reason rust would be limited in runtime execution compared to c, I was hoping for this proving an edge.

Apparently not a big of an effect as I hoped.


Rust doesn't have immutable memory, only access restrictions. An exclusive owner of an object can always mutate it, or can lend temporary read-only access to it. So the same memory may flip between exclusive-write and shared-read back and forth.

It's an interesting optimization, but not something that could be done directly.


I think it would be quite difficult to actually arrange the memory layout to take advantage of this in a useful way. Mutable/immutable is very context-dependent in rust.

I agree with this whole-heartedly. Rust is a LANGUAGE and C is a LANGUAGE. They are used to describe behaviours. When you COMPILE and then RUN them you can measure speed, but that's dependent on two additional bits that are not intrinsically part of the languages themselves.

Now: the languages may expose patterns that a compiler can make use of to improve optimizations. That IS interesting, but it is not a question of speed. It is a question of expressability.


No. As you've made clear, it's a question of being able to express things in a way that gives more information to a compiler, allowing it to create executables that run faster.

Saying that a language is about "expressability" is obvious. A language is nothing other than a form of expression; no more, no less.


Yes. But the speed is dependent on whether or not the compiler makes use of that information and the machine architecture the compiler is running it on.

Speed is a function of all three -- not just the language.

Optimizations for one architecture can lead to perverse behaviours on another (think cache misses and memory layout -- even PROGRAM layout can affect speed).

These things are out of scope of the language and as engineers I think we ought to aim to be a bit more precise. At a coarse level I can understand and even would agree with something like "Python is slower than C", but the same argument applies there as well.

But at some point objectivity ought to enter the playing field.


> ... it's a question of being able to express things in a way that gives more information to a compiler, allowing it to create executables that run faster.

There is expressing idea via code, and there is optimization of code. They are different. Writing what one may think is "fully optimized code" the first time is a mistake, every time, and usually not possible for a codebase of any significant size unless you're a one-in-a-billion savant.

Programming languages, like all languages, are expressive, but only as expressive as the author wants to be, or knows how to be. Rarely does one write code and think "if I'm not expressive enough in a way the compiler understands, my code might be slightly slower! Can't have that!"

No, people write code that they think is correct, compile it, and run it. If your goal is to make the most perfect code you possibly can, instead of the 95% solution is the robust, reliable, maintainable, and testable, you're doing it wrong.

Rust is starting to take up the same mental headspace as LLMs: they're both neat tools. That's it. I don't even mind people being excited about neat tools, because they're neat. The blinders about LLMs/Rust being silver bullets for the software industry need to go. They're just tools.


I’m not sure about the other UB opportunities, but in idiomatic rust code this just doesn’t come up.

In C, you frequently write for loops with signed integer counters for the compiler to realize the loop must hit the condition. In Rust you write for..each loops or invoke heavily inlined functional operators. It ends up all lowering to the same assembly. C++ is the worst here because size_t is everywhere in the standard library so you usually end up using size_t for the loop counter, negating the ability for the compiler to exploit UB.


The main performance difference between Rust, C, and C++ is the level of effort required to achieve it. Differences in level of effort between these languages will vary with both the type of code and the context.

It is an argument about economics. I can write C that is as fast as C++. This requires many times more code that takes longer to write and longer to debug. While the results may be the same, I get far better performance from C++ per unit cost. Budgets of time and money ultimately determine the relative performance of software that actually ships, not the choice of language per se.

I've done parallel C++ and Rust implementations of code. At least for the kind of performance-engineered software I write, the "unit cost of performance" in Rust is much better than C but still worse than C++. These relative costs depend on the kind of software you write.


I like this post. It is well-balanced. Unfortunatley, we don't see enough of this in discussions of Rust vs $lang. Can you share a specific example of where the "unit cost of performance" in Rust is worse than C++?

> I can write C that is as fast as C++.

Only if ignoring the C++ compile time execution capabilites.


C++ compile time execution is just a gimmicky code generator, you can do it in any language.

Yeah, I could also be writting in a macro assembler for some Lisp inspired ideas and optimal performace.

Any code that can be generated at compile-time can be written the old fashioned way.

Including using a macro assembler with a bunch MASM/TASM like clever macros.

> I can write C that is as fast as C++

I generally agree with your take, but I don't think C is in the same league as Rust or C++. C has absolutely terrible expressivity, you can't even have proper generic data structures. And something like small string optimization that is in standard C++ is basically impossible in C - it's not an effort question, it's a question of "are you even writing code, or assembly".


Yes, it is the difference between "in theory" and "in practice". In practice, almost no one would write the C required to keep up with the expressiveness of modern C++. The difference in effort is too large to be worth even considering. It is why I stopped using C for most things.

There is a similar argument around using "unsafe" in Rust. You need to use a lot of it in some cases to maintain performance parity with C++. Achievable in theory but a code base written in this way is probably going to be a poor experience for maintainers.

Each of these languages has a "happy path" of applications where differences in expressivity will not have a material impact on the software produced. C has a tiny "happy path" compared to the other two.


Also in theory, one could be using a static analyser all the time as a C or C++ build step.

Lint is part of UNIX toolset since 1979, and we have modern versions freely available like clang tidy.

In practice, many devs keep thinking they know better.


> On the other hand, signed integer overflow being UB would count for C/C++

C and C++ don't actually have an advantage here because this is only limited to signed integers unless you use compiler-specific intrinsics. Rust's standard library allows you to make overflow on any specific arithmetic operation UB on both signed and unsigned integers.


It's interesting, because it's a "cultural" thing like the author discusses, it's a very good point. Sure, you can do unsafe integer arithmetic in Rust. And you can do safe integer arithmetic with overflow in C/C++. But in both cases, do you? Probably you don't in either case.

"Culturally", C/C++ has opted for "unsafe-but-high-perf" everywhere, and Rust has "safe-but-slightly-lower-perf" everywhere, and you have to go out of your way to do it differently. Similarly with Zig and memory allocators: sure, you can do "dynamically dispatched stateful allocators that you pass to every function that allocates" in C, but do you? No, you probably don't, you probably just use malloc().

On the other hand: the author's point that the "culture of safety" and the borrow checker in Rust frees your hand to try some things in Rust which you might not in C/C++, and that leads to higher perf. I think that's very true in many cases.

Again, the answer is more or less "basically no, all these languages are as fast as each other", but the interesting nuance is in what is natural to do as an experienced programmer in them.


C++ isn't always "unsafe-but-high-perf". Move semantics are a good example. The spec goes to great lengths to ensure safety in a huge number of scenarios, at the cost of performance. Mostly shows up in two ways: one, unnecessary destructor calls on moved out objects, and two, allowing throwing exceptions in move constructors which prevents most optimizations that would be enabled by having move constructors in the first place (there was an article here recently on this topic).

Another one is std::shared_ptr. It always uses atomic operations for reference counting and there's no way to disable that behavior or any alternative to use when you don't need thread safety. On the other hand, Rust has both non-atomic Rc and atomic Arc.


> one, unnecessary destructor calls on moved out objects

That issue predates move semantics by ages. The language always had very simple object life times, if you create Foo foo; it will call foo.~Foo() for you, even if you called ~Foo() before. Anything with more complex lifetimes either uses new or placement new behind the scenes.

> Another one is std::shared_ptr.

From what I understand shared_ptr doesn't care that much about performance because anyone using it to manage individual allocations already decided to take performance behind the shed to be shot, so they focused more on making it flexible.


C++11 totally could have started skipping destructors for moved out values only. They chose not to, and part of the reason was safety.

I don't agree with you about shared_ptr (it's very common to use it for a small number of large/collective allocations), but even if what you say is true, it's still a part of C++ that focuses on safety and ignores performance.

Bottom line - C++ isn't always "unsafe-but-high-perf".


The rust standard library does make targeted use of unchecked arithmetic when the containing type can ensure that that overflow never happens and benchmarks have shown that it benefits performance. E.g. in various iterator implementations. Which means the unsafe code has to be written and encapsulated once, users can now use safe for loops and still get that performance benefit.

> On the other hand, signed integer overflow being UB would count for C/C++

Rust defaults to the platform treatment of overflows. So it should only make any difference if the compiler is using it to optimize your code, what will most likely lead to unintended behavior.


Rust's overflow behavior isn't platform-dependent. By default, Rust panics on overflow when compiled in debug mode and wraps on overflow when compiled in release mode, and either behavior can be selected in either mode by a compiler flag. In neither case does Rust consider it UB for arithmetic operations to wrap.

Writing a function with UB for overflows doesn't cause unintended behavior if you're doing it to signal there aren't any overflows. And it's very important because it's needed to do basically any loop rewriting.

On the other hand, writing a function that recovers from overflows in an incorrect/useless way still isn't helpful if there are overflows.


> For instance, in C, qsort() takes a function pointer for the comparison function, in Rust and C++, the standard library sorting functions are templated on the comparison function.

That's more of a critique of the standard libraries than the languages themselves.

If someone were writing C and cared, they could provide their own implementation of sort such that the callback could be inlined (LLVM can inline indirect calls when all call sites are known), just as it would be with C++'s std::sort.

Further, if the libc allows for LTO (active area of research with llvm-libc), it should be possible to optimize calls to qsort this way.


"could" and "should" are doing some very theoretical heavy lifting here.

Sure, at the limit, I agree with you, but in reality, relying on the compiler to do any optimization that you care about (such as inlining an indirect function call in a hot loop) is incredibly unwise. Invariably, in some cases it will fail, and it will fail silently. If you're writing performance critical code in any language, you give the compiler no choice in the matter, and do the optimization yourself.

I do generally agree that in the case of qsort, it's an API design flaw


> qsort, it's an API design flaw

It's just a generic sorting function. If you need more you're supposed to write it yourself. The C standard library exists for convenience not performance.


Fair point.

> That's more of a critique of the standard libraries than the languages themselves.

But we're right to criticise the standard libraries. If the average programmer uses standard libraries, then the average program will be affected (positively and negatively) by its performance and quirks.


And compile time execution.

With C you only have macro soup and the hope the compiler might optimise some code during compilation into some kind of constant values.

With C++ and Rust you're sure that happens.


You're qsort example is basically the same reason people say C++ is faster than Rust. C++ templates are still a lot more powerful than Rusts systems but that's getting closer and closer every day.

It is?? Can you give some examples of high performance stuff you can do using C++'s template system that you can't do in rust?

They are likely referring to the scope of fine-grained specialization and compile-time codegen that is possible in modern C++ via template metaprogramming. Some types of complex optimizations common in C++ are not really expressible in Rust because the generics and compile-time facilities are significantly more limited.

As with C, there is nothing preventing anyone from writing all of that generated code by hand. It is just far more work and much less maintainable than e.g. using C++20. In practice, few people have the time or patience to generate this code manually so it doesn't get written.

Effective optimization at scale is difficult without strong metaprogramming capabilities. This is an area of real strength for C++ compared to other systems languages.


Again, can you provide an example or two? Its hard to agree or disagree without an example.

I think all C++ wild template stuff can be done via proc macros. Eg, in rust you can add #[derive(Serialize, Deserialize)] to have a highly performant JSON parser & serializer. And thats just lovely. But I might be wrong? And maybe its ugly? Its hard to tell without real examples.


> As with C, there is nothing preventing anyone from writing all of that generated code by hand. It is just far more work and much less maintainable than e.g. using C++20.

It's also still less elegant, but compile time codegen for specialisation is part of the language (build system?) with build.rs & macros. serde makes strong use of this to generate its serialisation/deserialisation code.


>signed integer overflow being UB would count for C/C++

Then, I raise you to Zig which has unsigned integer overflow being UB.


Interestingly enough, Zig does not use the same terminology as C/C++/Rust do here. Zig has "illegal behavior," which is either "safety checked" or "unchecked." Unchecked illegal behavior is like undefined behavior. Compiler flags and in-source annotations can change the semantics from checked to unchecked or vice versa.

Anyway that's a long way of saying that you're right, integer overflow is illegal behavior, I just think it's interesting.



There was a contest for which language the fastest tokenizer could be written in. I entered my naive 15 minutes Rust version and got second place among roughly 30 entries. First place was hand-crafted assembly.

I am not saying Rust is faster always. But it can be a damn performant language even if you don't think about performance too deeply or don't twist yourself into bretzels to write performant code.

And in my book that counts for something. Because yes, I want my code to be performant, but I'd also not have it blow up on edge cases, have a way to express limitations (like a type system) and have it testable. Rust is pretty good even if you ignore the hype. I write audio DSP code on embedded devices with a strict deadline in C++. I plan to explore Rust for this, especially now since more and more embedded devices start to have more than one processor core.


Rust has linker optimizations that can make it faster in some cases

Huh? Both have LTO. There are linker optimizations available to Rust and not to C and C++. They all use the same God damn linker.

A few years ago I pulled a rust library into a swift app on ios via static linking & C FFI. And I had a tiny bit of C code bridge the languages together.

When I compiled the final binary, I ran llvm LTO across all 3 languages. That was incredibly cool.


At that point the real question should be restated. Does the LLVM IL that is generated from clang and rustc matter in a meaningful way?

Strict aliasing analysis of rust will provide some fundamental better optimization than C.

>in Rust and C++, the standard library sorting functions are templated on the comparison function. This means it's much easier for the compiler to specialize the sorting function, inline the comparisons and optimize around it.

I think this is something of a myth. Typically, a C compiler can't inline the comparison function passed to qsort because libc is dynamically linked (so the code for qsort isn't available). But if you statically link libc and have LTO, or if you just paste the implementation of qsort into your module, then a compiler can inline qsort's comparison function just as easily as a C++ compiler can inline the comparator passed to std::sort. As for type-specific optimizations, these can generally be done just as well for a (void *) that's been cast to a T as they can be for a T (though you do miss out on the possibility of passing by value).

That said, I think there is an indirect connection between a templated sort function and the ability to inline: it forces a compiler/linker architecture where the source code of the sort function is available to the compiler when it's generating code for calls to that function.


qsort is obviously just an example, this situation applies to anything that takes a callback: in C++/Rust, that's almost always generic and the compiler will monomorphize the function and optimize around it, and in C it's almost always a function pointer and a userData argument for state passed on the stack. (and, of course, it applies not just to callbacks, but more broadly to anything templated).

I'm actually very curious about how good C compilers are at specializing situations like this, I don't actually know. In the vast majority cases, the C compiler will not have access to the code (either because of dynamic linking like in this example, or because the definition is in another translation unit), but what if it does? Either with static linking and LTO, or because the function is marked "inline" in a header? Will C compilers specialize as aggressively as Rust and C++ are forced to do?

If anyone has any resources that have looked into this, I would be curious to hear about it.


Dynamic linking will inhibit inlining entirely, and so yes qsort does not in practice get inlined if libc is dynamically linked. However, compilers can inline definitions across translation units without much of any issue if whole program optimization is enabled.

The use of function pointers doesn't have much of an impact on inlining. If the argument supplied as a parameter is known at compile time then the compiler has no issue performing the direct substitution whether it's a function pointer or otherwise.


My point is that the real issue is just whether or not the function call is compiled as part of the same unit as the function. If it is, then, certainly, modern C compilers can inline functions called via function pointers. The inlining itself is not made easier via the template magic.

Your C comparator function is already “monomirphized” - it’s just not type safe.


Wouldn't C++ and Rust eventually call down into those same libc functions?

I guess for your example, qsort() it is optional, and you can chose another implementation of that. Though I tend to find that both standard libraries tend to just delegate those lowest level calls to the posix API.


Rust doesn't call into libc for sort, it has its own implementation in the standard library.

Obviously. How about more complex things like multi-threading APIs though? Can the Rust compiler determine that the subject program doesn't need TLS and produce a binary that doesn't set it up at all, for example?

Optimising out TLS isn't going to be a good example of compiler capability. Whether another thread exists is a global property of a process, and beyond that the system that process operates in.

The compiler isn't going to know for instance that an LD_PRELOAD variable won't be set that would create a thread.


> Whether another thread exists is a global property of a process, and beyond that the system that process operates in.

TLS is a language feature. Whether another thread exists doesn't mean it has to use the same facilities as the main program.

> The compiler isn't going to know for instance that an LD_PRELOAD variable won't be set that would create a thread.

Say the program is not dynamically linked. Still no?


> Say the program is not dynamically linked. Still no?

Whether the program has dynamic dependencies does not dictate whether a thread can be created, that's a property of the OS. Windows has CreateRemoteThread, and I'd be shocked if similar capabilities didn't exist elsewhere.

If I mark something as thread-local, I want it to be thread-local.


Many of the libc functions are bad apis with traditionally bad implementations.

> HashMap implements Extend, so just h0.extend(h1) and you're done, the people who made your HashMap type are much better equipped to optimize this common operation.

Are you sure? I'm not very used to reading Rust stdlib, but this seems to be the implementation of the default HashMap extend [1]. It just calls self.base.extend. self.base seems to be hashbrown::hash_map, and this is the source for it's extend [2]. In other words, does exactly the same thing, just iterates through hash map and inserts it.

Maybe I'm misreading something going through the online docs, or Rust does the "random seed" thing that abseil does, but just blinding assuming something doesn't happen "because Rust" is a bit silly.

[1]: https://doc.rust-lang.org/src/std/collections/hash/map.rs.ht...

[2]: https://docs.rs/hashbrown/latest/src/hashbrown/map.rs.html#4...


Yes, HashMap will by default be randomly seeded in Rust, but also the code you linked intelligently reserves capacity. If h0 is empty, it reserves enough space for all of h1, and if it isn't then it reserves enough extra space for half of h1, which turns out to be a good compromise.

Note that the worst case is we ate a single unneeded growth, while the best case is that we avoided N - 1 grows where N may be quite large.


First of all, as khuey pointed out, the current implementation accumulates values. extend() replaces values instead. It wouldn't achieve the same functionality.

I tried extend() anyway. It didn't work well. Based on your description, extend() implements a variation of preallocation (i.e. Solution II). However, because it doesn't always reserve enough space to hold the merged hash table, clustering still happens depending on N. I have updated the rust implementation (with the help of LLM as I am not a good rust programmer). You can try it yourself with "ht-merge-rust 1 -e -n14m" or point out if I made mistakes.

> HashMap will by default be randomly seeded in Rust

Yes, so it is with Abseil. The default rust hash functions, siphash in the standard library and foldhash in hashbrown, are ~3X as slow in comparison to simple hash functions on pure insertion load. When performance matters, we will use faster hash functions at least for small keys and will need a solution from my post.

> In a new enough C++ in theory you might find the same functionality supported, but Quality of Implementation tends to be pretty frightful.

This is not necessary. The rust libraries are a port of Abseil, a C++ library. Boost is as fast as Rust. Languages/libraries should learn from each other, not fight each other.


> First of all, as khuey pointed out, the current implementation accumulates values. extend() replaces values instead. It wouldn't achieve the same functionality.

Ah! Yes, I apologise. I missed the + in += and I'm not used to a hash table which defaults initialization for unseen entries (as the C++ hash tables all tend to because its native container types behave that way) so I wasn't looking for it.

The SipHash will be noticeably slower, no question about it, and so if you need to and know what you're paying you can replace the hash, including with integer_hasher which gives you what you'd likely know from many C++ stdlib implementations - an identity function presented as a hash.

> This is not necessary. The rust libraries are a port of Abseil, a C++ library.

More specifically HashBrown is a port [edited: actually a re-implementation I think, design->Rust not C++->Rust] of Abseil's Swiss Tables, and these days Rust's HashMap (and HashSet of course) use HashBrown but that's not what I was getting at here

I was thinking about analogues of Extend (because as I wrote above, I didn't notice that you're accumulating not overwriting) and modern C++ has this kind of feature in Ranges::to however it doesn't quite have Extend and as I said QoI is poor, there are often trivial optimisations that Rust does but the C++ means the same but isn't optimised.

I am interested in a quite different benchmark for hash tables, rather than merging I'm interested in very small hash tables. Clearly for two items it will be faster to try them both, and clearly for a million items trying them all is awful, so I measure a VecMap type (same API as a hash table but actually just the growable array of unordered key->value pairs, searched linearly) against HashMap and other implementations of this API.

For N=25 VecMap is still competitive, but even at N=5 if we use a very fast hash (such as that identity function) instead of SipHash we can beat VecMap for most operations. I suspect this sort of benchmark would fare very differently on older hardware (faster memory relative to ALU operations) and the direction of travel is likely to stay the same for the foreseeable future. In 1975 if you have six key->value pairs you don't want a hash table because it's too slow but in 2025 you probably do.


The tricky part is really point 2 there, that can be harder than it looks (e.g. even simple file I/O can be network drives). Async IO can really shine here, though it’s not exactly trivial designing async cancelletion either.


Feel free to read the article before commenting.


I’ve read it, and I found nothing to justify that piece of code. Can you please explain?


The while loop surrounds the whole thread, which does multiple tasks. The conditional is there to surround some work completing in a reasonable time. That's how I understood, at least.


Does not seem so clear to me. If so it could be stated with more pseudo code. Also the eventual need for multiple exit points…


Never heard of this, I’m really interested in digging into this paper. Thank you both for the tip!


No, but if you phrase it like "there are multiple correct answers to the question 'I have a list of integers, write me a computer program that sorts it'", that is obviously true. There's an enormous variety of different computer programs that you can write that sorts a list.


Is the protocol inherently inferior in situations like that, or is this because we've spent decades optimizing for TCP and building into kernels and hardware? If we imagine a future where QUIC gets that kind of support, will it still be a downgrade?


There is no performance disadvantage at the normal speed of most implementations. With a good QUIC implementation and a good network stack you can drive ~100 Gb/s per core on a regular processor from userspace with 1500-byte MTU with no segmentation offload if you use a unencrypted QUIC configuration. If you use encryption, then you will bottleneck on the encryption/decryption bandwidth of ~20-50 Gb/s depending on your processor.

On the Linux kernel [1], for some benchmark they average ~24 Gb/s for unencrypted TCP from kernel space with 1500-byte MTU using segmentation offload. For encrypted transport, they average ~11 Gb/s. Even using 9000-byte MTU for unencrypted TCP they only average ~39 Gb/s. So there is no inherent disadvantage when considering implementations of this performance level.

And yes, that is a link to a Linux kernel QUIC vs Linux kernel TCP comparison. And yes, the Linux kernel QUIC implementation is only driving ~5 Gb/s which is 20x slower than what I stated is possible for a QUIC implementation above. Every QUIC implementation in the wild is dreadfully slow compared to what you could actually achieve with a proper implementation.

Theoretically, there is a small fundamental advantage to TCP due to not having multiple streams which could allow it maybe a ~2x performance advantage when comparing perfectly optimal implementations. But, you are comparing a per-core control plane throughput using 1500-byte MTU of, by my estimation, ~300 Gb/s on QUIC vs ~600 Gb/s on TCP at which point both are probably bottlenecking on your per-core memory bandwidth anyways.

[1] https://lwn.net/ml/all/cover.1751743914.git.lucien.xin@gmail...


No, this is very much not the same. The Raku version is like writing this in Python:

    def fibonacci():
        a, b = 0, 1

        while True:
            yield a
            a, b = b, a+b
And taking the 40th element. It's not comparable at all to the benchmark, that's deliberately an extremely slow method of calculating fibonacci numbers for the purpose of the benchmark. For this version, it's so fast that the time is dominated by the time needed to start up and tear down the interpreter.


Your argument is that it’s not ok to think of all Jewish people as a monolithic group, and therefore his statement where he considered all arabs as a monolithic group is ”legitimate”? Seriously?

Just like it’s not ok to see all jews as part of the same murderous conspiracy, it’s not ok to see all arabs as part of one either.


You can absolutely save data like that, it's just that it's a terrible idea. There are obvious portability concerns issues: little-endian vs. big endian, 32-bit vs. 64-bit, struct padding, etc.

Essentially, this system works great if you know the exact hardware and compiler toolchain, and you never expect to upgrade it with things that might break memory layout. Obviously this does not hold for Word: it was written originally in a 32-bit world and now we live in a 64-bit one, MSVC has been upgraded many times, etc. There's also address space concern: if you embed your pointers, are you SURE that you're always going to be able to load them in the same place in the address space?

The overhead of deserialization is very small with a properly written file format, it's nowhere near worth the sacrifice in portability. This is not why Word is slow.


Andrew Kelley (author of zig) has a nice talk about programming without pointers allowing ultra fast serialize/deserialization. [0]

And then you have things like cap'n'proto if you want to control your memory layout. [1]

But for "productivity" files, you are essentially right. Portability and simplicity of the format is probably what matters.

[0]: https://www.hytradboi.com/2025/05c72e39-c07e-41bc-ac40-85e83...

[1]: https://capnproto.org/


That is true, cap’n proto and flatbuffers are excellent realizations of this basic concept. But that’s very different thing from what the commenter is talking about Word doing in the 90s, of just memory-mapping the internal data structures and be done with it.


Smalltalk is something like that.


It's only a terrible idea because our tools are terrible.

That's exactly the point!

(For example, if Rust would detect a version change, it could rewrite the data into a compatible format, etc.)


At which point you're not just memory mapping the file. And if the new version changes the size of the object, it doesn't pack in the same place in memory, so you have to repack before saving. Even serializing with versioning is very hard. Memory mapping is much worse. Several other comments indicate that I am not the only one with bad experiences here.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: