You can't feed something like that to the free ChatGPT model and expect anything...

dzaima · 2025-11-28T22:09:39 1764367779

Those first two indeed look correct (third link is not public); indeed free chatgpt is understandably not the best, but I did give it basically the smallest function in my codebase that does something meaningful, instead of any of the actually-non-trivial multi-kilobyte functions doing realistic things needing context.

CamperBob2 · 2025-11-28T22:51:39 1764370299

Would be interesting to push the models with a couple of larger functions, if you have some links you'd like me to try.

I have paid pro accounts on all three, but for some reason Gemini is no longer allowing links to be shared on some queries including this one. All it would let me do is export it to Docs, which I thought would be publicly visible but evidently isn't.

dzaima · 2025-11-29T20:41:21 1764448881

Actually, even finding a larger function that would by itself have a meaningful disassembly is posing problematic; basically every function deals with in-memory data structures non-trivially, and a bunch do indirect jumps (function pointers, but also lookup-table-based switches, which require table data from memory in addition to assembly to disassemble).

Like, here's a ~2.7x larger function: https://dzaima.github.io/paste/#0jVdNjxs3DL3nVwzQo30gRY00ChY... (is https://github.com/dzaima/CBQN/blob/90c1dc09e88c5324373281f6... with a bunch of inlining)

(I'm keeping the other symbol names there even though they'd likely not be there for real closed-source things, under the assumption that for a full thing you'd have something doing a quick naming pass beforehand)

This is still very much on the trivial end, but it's already dealing with in-memory structures, three inlined memory allocation calls (two half-deduplicated into one by the compiler, and the compiler initializing a bunch of the objects' fields in one store), and a bunch of inlined tagged object manipulations; should definitely be possible to get some disassembly from that, but figuring out the useful abstractions that make it readable without pain would probably take aggregating over multiple functions.

(unrelated notes of your previous results - claude indeed guessed correctly that it's BQN! though CBQN is presumably wholesale in its training data anyway; it did miss that the function has an unused 0th arg (a "this" pointer), which'd cause problems as the function is stored & used as a generic function pointer (this'd probably be easily resolved when attempting to integrate it in a wider disassembly though); neither claude nor cgpt unified the `x>>48==0xfff7` and `(x&0xffff000000000000)==0xfff7000000000000` which do the exact same thing but clang is stupid [https://github.com/llvm/llvm-project/issues/62145] and generates different things; and of course a big question is how many such intricacies could be automatically reduced down with a full codebases worth of context, cause understandably the single-function disassemblies are way way more verbose than the original)