Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Every time I see a clean SSA explainer like this, I’m reminded that the “simplicity” of SSA only exists because we’ve decided mutation is evil. It’s not that SSA is simpler — it’s that we’ve engineered our entire optimization pipeline around pretending state doesn’t exist.

It’s a brilliant illusion that works… until you hit aliasing, memory models, or concurrency, and suddenly the beautiful DAG collapses into a pile of phi nodes and load/store hell.



SSA isn't about saying mutation is evil. It's about trivializing chasing down def-use rules. In the Dragon Book, essentially the first two dataflow analyses introduced are "reaching definitions" and "live variables"; within an SSA-based IR, these algorithms are basically "traverse a few pointers". There's also some ancillary benefits--SSA also makes a flow-insensitive algorithm partially flow-sensitive just by the fact that it's renaming several variables.

Sure, you still need to keep those algorithms in place for being able to reason about memory loads and stores. But if you put effort into kicking memory operations into virtual register operations (where you get SSA for free), then you can also make the compiler faster since you're not constantly rerunning these analyses, but only on demand for the handful of passes that specifically care about eliminating or moving loads and stores.


> pretending state doesn’t exist.

As a fan of a functional language, immutability doesn't mean state doesn't exist. You keep state with assignment --- in SSA, every piece of state has a new name.

If you want to keep state beyond the scope of a function, you have to return it, or call another function with it (and hope you have tail call elimination). Or, stash it in a mutable escape hatch.


1) You are just pretending globals don't exist. Some problems are best modeled with global variables. Many are not, but when you need a global, you need a global. Forcing people to copy globals around because you won't admit this doesn't really help.

2) The main bottleneck in the vast majority of code is memory accesses (also called memory pressure). This is why most optimizing compilers don't really change the overall speed of code much these days. You are optimizing the wrong thing and in the process increasing the memory pressure. You can either as a field keep ignoring this 25 years after hardware changed to make memory accesses the bottleneck, or you can keep making fad languages that fewer and fewer people use. The second choice has the added cost of degrading how most devs view CS in general.

PS The reason compiler writers like FP is because its a good way to write a compiler. This isn't true of almost anything else in software outside of the classroom.

PPS I say this as someone currently writing a compiler in an FP language (for a unique use but its still a compiler)


I think functional just means that there's no undefined behavior. SSA shows that you can compile a procedural language to mostly functional constructs, and functional languages likewise have introduced things like mutation, where you can write to variables as if they were mutable and the compiler figures it out for you.

On the issue of copy-to-modify, if you can prove the old copy will never be used after you modify it, it's perfectly safe to implement it as an in place modification.

How do you prove/enforce this? With tracking ownership+lifetimes, like Rust does. In fact I'd argue that this makes Rust a functional language (no UD), and Rust isn't slow.


Nit...Functional means there are no side effects. Side effects are defined, they just aren't easy to reason about.

I do want to point out the expected result of almost every single program an app dev has ever been paid to do is entirely defined as a collection of side effects. For example all I/O is a side effect. I know people have crammed I/O into frameworks and defined them as pure, but that's mostly handwaving.


> the “simplicity” of SSA only exists because we’ve decided mutation is evil.

Mutation is the result of sloppy thinking about the role of time in computation. Sloppy thinking is a hindrance to efficient and tractable code transformations.

When you "mutate" a value, you're implicitly indexing it on a time offset - the variable had one value at time t_0 and another value at time t_1. SSA simply uses naming to make this explicit. (As do CPS and ANF, which is where that "equivalence" comes from.)

If you don't use SSA, CPS, or ANF for this purpose, you need to do something else to make the time dimension explicit, or you're going to be dealing with some very hairy problems.

"Evil" in this case is shorthand for saying that mutable variables are an unsuitable model for the purpose. That's not a subjective decision - try to achieve similar results without dealing with the time/mutation issue and you'll find out why.


SSA is appealing because you can defer the load/store hell until after a bunch of optimizations, though, and a lot of those optimizations becomes a lot easier to reason about when you get to pretend state doesn't exist.


> we’ve decided mutation is evil.

SSA isn't (primarily) concerned with memory, it's concerned with local variables. It completely virtualizes the storage of local variables--in fact, all intermediate computations. By connecting computations through dataflow edges and not storage, it removes ordering (except that induced by dependence edges) from consideration.

It is, after all, what CPUs do under the hood with register renaming! They are doing dynamic SSA, a trick they stole from us compiler people!


You have it backwards. Modern compilers don't use SSA because it's "simpler", we use it because it enables very fast data-flow optimizations (constant prop, CSE, register allocation, etc.) that would otherwise require a lot of state. It doesn't "pretend state doesn't exist", it's actually exactly what makes it possible/practical for the compiler to handle changes in state.

As some evidence to the second point: Haskell is a language that does enforce immutability, but it's compiler, GHC, does not use SSA for main IR -- it uses a "spineless tagless g-machine" graph representation that does, in fact, rely on that immutability. SSA only happens later once it's lowered to a mutating form. If your variables aren't mutated, then you don't even need to transform them to SSA!

Of course, you're welcome to try something else, people certainly have -- take a look at how V8's move to Sea-of-Nodes has gone for them.


To appreciate the “fast” part, nothing beats reading though LuaJIT’s lj_opt_fold.c, none of which would work without SSA.

Of course, LuaJIT is cheating, because compared to most compilers it has redefined the problem to handling exactly two control-flow graphs (a line and a line followed by a loop), so most of the usual awkward parts of SSA simply do not apply. But isn’t creatively redefining the problem the software engineer’s main tool?..


Meanwhile Java Hotspot, where Sea-of-Nodes was originally made mainstream, and GraalVM are doing just fine.

The author of Sea-of-Nodes approach is quite critic of V8's decision, as one would expect.

https://www.youtube.com/watch?v=Zo801M9E--M


> take a look at how V8's move to Sea-of-Nodes has gone for them.

Are you implying it hasn't gone well? I thought it bought some performance at least. What are the major issues? Any sources I can follow up on?


V8 blog post from March: "Land ahoy: leaving the Sea of Nodes" https://v8.dev/blog/leaving-the-sea-of-nodes


Feedback from sea of nodes algorithm creator, https://www.youtube.com/watch?v=Zo801M9E--M


Fascinating, thanks!


The major issue is that approximately one person in the world understands it well enough to make it work in practice, and that kind of tech debt can't really be pushed too far.


> we’ve decided mutation is evil

The functional programmers have decided mutability is evil. The imperative programmers have not.

We functional programmers do crazy stuff and pretend we're not on real hardware - a new variable instead of mutating (how wasteful!) and infinite registers (what real-world machine supports that!?).

Anyway, there's plenty of room for alternatives to SSA/CPS/ANF. It's always possible to come up with something with more mutation.


Functional programming is actually good for performance. Clojure uses "Persistent data structure", which has its own Wikipedia page. You can have a very complex object, like a long and wide JSON with several levels of indentation. If you want to modify a tiny part of this object, you won't waste memory or time. The object won't be copied entirely. The parts that haven't changed will be reused.

This is only possible because the objects are immutable. If they were mutable, you wouldn't be able to trust that the parts that haven't changed won't change in the future. This means you can share these parts. This is good for memory and for parallelism.

If you're building a React app in JS, React has to check if the object has changed deeply. If you're doing a React app with Clojure, this check is actually disabled. React will only use a single === comparison to know wether two objects are different or not, and this is basically an integer comparison (like a memory address comparison).


> Functional programming is actually good for performance.

No, no it is not. I really wish people would stop repeating this lie. The main bottleneck in high performance code is (almost always) memory pressure. FP increases memory pressure as does immutability. No amount of clever CPU instruction optimization will ever overcome this. Compiler writers think this is the case because FP code is much easier to optimize (and compilers are easier to write in FP). What they don't seem to understand is that they are optimizing the wrong thing (instruction count instead of memory access count).

If I was wrong about this, nobody would ever use Python (an interpreted language) in production because it would be many times slower. Its only 50% slower because memory is the bottleneck, not how many subtle optimizations the compiler can find.

PS Nobody should ever use an interpreted language in prod, but the real reason is security and it should be performance too.


SSA form is a state representation. SSA encodes data flow information explicitly which therefore simplifies all other analysis passes. Including alias analysis.


What a ridiculous comment. No one says mutation is “evil.” It's just harder to optimise. Then all this talk of pretending and illusions, as if the compiler doesn't really work and is producing fake outputs. I assure you that other people do not take your strangely moralising tone to compiler optimisations.


> "simplicity” of SSA only exists because we’ve decided mutation is evil

And SICP is still relevant, even more so today with concurrency problems all over. If in Intel Chips, threads with locking solutions, or multi-processing. Shared state should not exist, and functional languages did win there.


Is it that we think it's evil or that it's easier to deal with while guaranteeing some SSA-like pattern in the code? It's a fairly easy thing to agree to maintain, whereas if you are more jazzy with it you can end up with either very fiddly passes or just constantly running DFA


SSA is not about hiding states.

It's about naming intermediate states so you can refer to them in some way.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: