Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Ha! that was funny, I wonder though, getting fed tons of code, couldn’t Godbolt leverage code—-> Compiler Obj —-> Assembly as a mean to train an AI decompiler ? Food for thought.


I've always wondered about this. Compilers do a LOT of irreversible stuff. For example, symbol names usually aren't needed (unless you have a reflective language).

Where AI would really shine is reversing the (only seemingly reversible) optimizations. For example, GCC converts "x * 14" into "(x << 4) - x - x". Of course, you can never be 100% sure the programmer didn't actually want "shift left by four followed by two subtractions", but I'm convinced that 99% of the code I write is fairly predictable and statistically similar to whatever giant codebase you train it on.


Symbol names could be inferred from context


Throwing AI at the problem might not actually be the worst suggestion. I wonder how the likes of copilot model the AST. Heh, you might even be able to build an approximation of a compiler using AI.


I think it would be easier and faster to just take the millions of open source projects on github for that :)


...which don't have binaries. It's easier for Godbolt, since the whole purpose of the website is to compile and show output. If you crawl GitHub you need to compile the projects yourself, much more difficult.


Binaries are freely available from package management repos, with the benefit of having a known toolchain you can tag your ML inputs with. All the package managers I've worked with have a strongly structured "upstream" or "repo" field or similar that you can use to get to the source.


Fair enough!


Some projects do publish binaries with releases.


Just take all of Debian packages, or something like that.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: