When we first reviewed the equality saturation paper, we thought there was one m...

vlovich123 · on March 19, 2024

Can you memoize across invocations so that the time spent optimizing is cumulative across all builds?

fwip · on March 19, 2024

Speaking as an e-graph rookie - I believe so, with caveats.

You can certainly store the chosen extractions/optimizations and re-use any that don't change. For example, if a long chain of rewrites yields A ==> B ==> ... ==> Z, you could short-circuit that and assume that Z is still what you want to replace A with. Perhaps each time you compiled, you could only rewrite the 'changed' sections of the program, along with a randomly selected portion to explore for greater-optimization. (Although, you may just want to go all the way and write a tool that constantly adds to your equivalence DB in the background, with compilation just a matter of running your extraction heuristics.)

You probably wouldn't want to store every possible rewrite, though, as the space complexity or retrieval cost is likely a bigger deal than the time complexity (you wouldn't load 2+2=4 from a database, you'd just do the addition). But, if you restore only a subset of equivalencies that are known to be useful, there's not a simple way to keep the optimization from re-exploring the "rejected candidates" (without shutting off the eclass entirely) - and, it's even not clear to me that you'd want to, as changing program context may end up meaning that M becomes a better replacement candidate for A than Z.

(Boring prerequisites for this approach also include: consistent serialization of terms, avoiding storage of infinite equivalencies, like `A => A + 0 => A + 0 + 0 ==> ...` (and possibly of exponential equivalencies like the kind associativity rules can produce), and strong heuristics to capture "high-value" equivalencies that are both useful and cheaper to lookup than re-derive.)

thechao · on March 19, 2024

I always thought that it'd be nice to build an expository version of eq-sat that just operated on straight-line code, and didn't try to use the complicated e-graph structure (does it still use pi-nodes? I've not looked at the paper in a decade). The major complexity of eq-sat is in the novel extraction heuristics, and the complicated data-structures only mask that.

EDIT: Right. Blech. Theta & Phi nodes.