I just wish I could pretend shorts were regular videos that happened to have a weird aspect ratio. There are extensions that switch the player automatically (and you can do it by editing the url) but that doesn't change how they appear in the subscriptions feed (i.e. an annoying carousel that hides all the information you need to decide whether you want to click or not)
Yes, and it apparently burns lots of tokens. But what I've heard is that the outcomes are drastically less expensive than hand-reversing was, when you account for labor costs.
Can confirm. Matching decompilation in particular (where you match the compiler along with your guess at source, compile, then compare assembly, repeating if it doesn't match) is very token-intensive, but it's now very viable: https://news.ycombinator.com/item?id=46080498
Of course LLMs see a lot more source-assembly pairs than even skilled reverse engineers, so this makes sense. Any area where you can get unlimited training data is one we expect to see top-tier performance from LLMs.
My own experience has been that "ghidra -> ask LLM to reason about ghidra decompilation" is very effective on all but the most highly obfuscated binaries.
Burning tokens by asking the LLM to compile, disassemble, compare assembly, recompile, repeat seems very wasteful and inefficient to me.
That matches my experience too - LLMs are very capable in "translating" between domains - one of the best experience I've had with LLMs is turning "decompiled" source into "human readable" source. I don't think that "Binary Only" closed-source isn't the defense against this that some people here seem to think it is.
> Has anyone used an LLM to deobfuscate compiled Javascript?
Seems like a waste of money; wouldn't it be better to extract the AST deterministically, write it out and only then ask an LLM to change those auto-generated symbol names with meaningful names?
yes, but it requires some nudging if you don't want to waste tokens. it will happily grep and sed through massive javascript bundles but if you tell it to first create tooling like babel scripts to format, it will be much quicker.
Yeah, it's token intensive but worth it. I built a very dumb example harness which used IDA via MCP and analyzed/renamed/commented all ~67k functions in a binary, using Claude Haiku for about $150. A local model could've accomplished it for much less/free. The knowledge base it outputs and the marked up IDA db are super valuable.
I did something similar using ghidramcp for digging around this keyboard firmware, repo contains the ghidra project, linux driver and even patches to the original stock fw. https://github.com/echtzeit-solutions/monsgeek-akko-linux
Another asymmetric advantage for defenders - attackers need to burn tokens to form incomplete, outdated, and partially wrong pictures of the codebase while the defender gets the whole latest version plus git history plus documentation plus organizational memory plus original authors' cooperation for free.
Prediction 1. We're going to have cheap "write Photoshop and AutoCad in Rust as a new program / FOSS" soon. No desktop software will be safe. Everything will be cloned.
Prediction 2. We'll have a million Linux and Chrome and other FOSS variants with completely new codebases.
Prediction 3. People will trivially clone games, change their assets. Modding will have a renaissance like never before.
Prediction 4. To push back, everything will move to thin clients.
I think if prediction 1 is true (that it becomes cheap to clone existing software in a way that doesn't violate copyright law), the response will not be purely technical (moving to thin clients, or otherwise trying to technically restrict the access surface to make reverse engineering harder). Instead I'd predict that companies look to the law to replace the protections that they previously got from copyright.
Obvious possibilities include:
* More use of software patents, since these apply to underlying ideas, rather than specific implementations.
* Stronger DMCA-like laws which prohibit breaking technical provisions designed to prevent reverse engineering.
Similarly, if the people predicting that humans are going to be required to take ultimate responsibility for the behaviour of software are correct, then it clearly won't be possible for that to be any random human. Instead you'll need legally recognised credentials to be allowed to ship software, similar to the way that doctors or engineers work today.
Of course these specific predictions might be wrong. I think it's fair to say that nobody really knows what might have changed in a year, or where the technical capabilities will end up. But I see a lot of discussions and opinions that assume zero feedback from the broader social context in which the tech exists, which seems like they're likely missing a big part of the picture.
One of my "let's try out this vibecoding thing" toy projects was a custom programming language. At the time, I felt like it was my design, which I iterated on through collaborative conversations with Claude.
Then I saw someone's Show HN post for their own vibecoded programming language project, and many of the feature bullet points were the same. Maybe it was partly coincidence (all modern PLs have a fair bit of overlap), but it really gave me pause, and I mostly lost interest in the project after that.
Thats the thing about a normalization system, it is going to normalize outputs because its not built to output uniqueness, its to winnow uniqueness to a baseline. That is good in some instances, assuming that baseline is correct, but it also closes the aperture of human expression.
Token selection is based off normalization, even if you train a model to produce outlier answers, even in that process you are biasing to a subset of outliers, which is inherently normalizing.
Depending on the model architecture, there is normalization taking place in multiple different places in order to save compute and ensure (some) consistency in output. Training, by its very nature, also is a normalization function, since you are telling the model which outputs are and are not valid, shaping weights that define features.
One practical difference is that you can make dollar bill detection relatively robust. Sure, you could cut it into 4 pieces and scan them separately, but you'd still get stuck when it comes time to print them. There are only finitely many dollar bill shapes. But there are infinitely many plausible gun components, and infinitely more ways to divide them into sub-assemblies.
There is a pattern of yellow dots on the currency. I do not know at what size they tile across the paper, but the piece of currency would have to be smaller than that, most likely.
Far easier to dump the firmware and NOOP out that algo.
DRM is effective(ish) because of both technical and legal mechanisms. Without the legal mechanisms they'd ramp up the technical ones, which might end up even worse for legitimate users.
DRM is completely ineffective. It simply increases the cost of "piracy," as well as legitimate actions like home backup, to about $200 USD.
The worse they make it for legitimate users the more likely they are to just buy the necessary device and move on. The technical battle is not some limitless option that IP owners get to use, it eventually impinges on their core interests.
DRM is somewhat effective. I'm lazy enough about entertainment that I don't even bother with piracy. If content providers don't want to make it cheap and easy for me to watch then fuck 'em, I'll watch something else. I have a zillion other options.
Someone mentioned this previously in thread about how piracy (at least for sporting events) are a price issue. If they didn't charge an arm and a leg to watch (thinking of the NBA/NFL tv packages) they wouldn't have a problem.
The article is about sports leagues. I assume you're not as fungible with your choices there? Or at least, you'd agree, it isn't for the majority of the actual audience in question.
See old school satellite piracy for a clear example of where this is headed.
reply