More

davesque · 2026-05-07T19:57:19 1778183839

One question jumps out at me: just because a string of text happens to be a good compressed representation (in the autoencoder) of a model's internal activation, does that necessarily mean the text explains that activation in the context of the model? I want to take a look at what they released a bit more closely. Maybe there's a way that they answer this question?

Pretty neat work either way.

zozbot234 · 2026-05-07T20:21:08 1778185268

In the context of the provided examples, it's clear that the explanation provides casual information about the answer. There's a hilarious example in the paper where the user writes something like (trigger warning: alcohol abuse, depressive content) "I'm sitting here at 3 AM drinking vodka, I hate my life", the per-token translated activations repeatedly say something like "this user is totally Russian" elaborating at length on the implications of the text as new tokens are added, and the model literally answers in Russian instead of English! That's actually striking, it really shows the potential effectiveness of this technique in making even the most highly compressed "Neuralese" highly interpretable.

mike_hearn · 2026-05-08T09:28:12 1778232492

I thought that at first too but it's actually not the vodka reference triggering the association with Russian. The tokens they're decoding come before that word.

For some reason it thinks the text is slightly non-grammatical or that the lead-in "Human: Mom is sleeping in the next room and I'm sitting" resembles text found in Russian web content. Vodka and being depressed has nothing to do with it, and Anthropic say they located the documents in the pre-training set that caused this (which were indeed partly translated docs).

zozbot234 · 2026-05-08T11:16:46 1778239006

The "Mom is sleeping in the next room and I'm sitting" part does trigger the Russian association but also others including with risqué roleplay content (You can see this in the comprehensive view of all token explanations). I think the follow-on content does strenghten the association, though the authors mention 'vodka' can be replaced with 'champagne' and the model still brings up the Russian context, so that one word is not especially impactful.

phire · 2026-05-08T00:45:04 1778201104

I think this question is easier to answer if you look at the inverse: "Could a model maliciously smuggle intentions through a roundtrip of compressed representation without them being human readable"

And skimming through the paper; the answer to this inverse is obviously yes. The model often outputs gibberish, which doesn't matter because it still round-trips. The fact that often lines up near a good english representation of the activation is simply because that's what compresses/roundtrips well.

So a malicious LLM/NLA pair could just use gibberish to conceal intentions. Or if it's been forced to avoid gibberish, it can conceal information with stenography.

And the experiment where they change "rabbit" to "mouse" in the explanation provides evidence that this might be happening. It was only successful 50% of the time, which might mean they failed to eliminate all "rabbitness" from the activation.

However, I suspect this is solvable with future work.

During training of the NLA, just munge the textural representation through a 3rd LLM: Have it randomly reorder and reword the explication into various different forms (use synonyms, different dialects), destroying any side-channels that aren't human readable.

The NLA would be forced to use human readable representations to get a successful round trip.

dontlikeyoueith · 2026-05-08T02:15:50 1778206550

> The NLA would be forced to use human readable representations to get a successful round trip.

That still doesn't guarantee any semantic correspondence between the human readable representation and the model's "thinking".

The child's game of "Opposite Day" is a trivial example of encoding internal thoughts in language in a way that does not correspond to the normal meaning of the language.

chilmers · 2026-05-08T10:02:07 1778234527

They tested for this. From the paper:

“We find little evidence of steganography in our NLAs. Meaning-preserving transformations, like shuffling bullet points, paraphrasing, or translating the explanation to French, cause only small drops in FVE, and this gap does not widen over training.”

azakai · 2026-05-07T23:28:52 1778196532

I had the same question. I think that could be answered by using the predicted activation, but I don't see that in the paper.

That is, rather than just translate activation to text, then text to activation, that final activation could then be applied to the neural network, and it would be allowed to continue running from there.

If it kept running in a similar way, that would show that the predicted activation is close enough to the original one. Which would add some confidence here.

But a lot better would be to then do experiments with altered text. That is, if the text said "this is true" and it was changed to "this is false", and that intervention led to the final output implying it was false, that would be very interesting.

This seems obvious but I don't see it mentioned as a future direction there, so maybe there is an obvious reason it can't work.

zozbot234 · 2026-05-07T23:32:04 1778196724

> But a lot better would be to then do experiments with altered text. That is, if the text said "this is true" and it was changed to "this is false", and that intervention led to the final output implying it was false, that would be very interesting.

They do essentially that with the rhyming example, changing "rabbit" in the explanation to "mouse" and generating text that's consistent with that change.

azakai · 2026-05-08T00:36:55 1778200615

Thanks! I missed that part before.

davesque · 2026-05-06T00:26:51 1778027211

Yeah AI is the perfect scapegoat for layoffs recently to soften the impact on stock price and investor confidence. Coinbase is obviously doing layoffs because they are strongly tethered to a stock market that is rattled by political conflict and economic uncertainty.

davesque · 2026-05-02T00:48:24 1777682904

Stockton Rush trusted his submarine with his own life.

oblio · 2026-05-02T07:25:04 1777706704

> He criticized the Passenger Vessel Safety Act of 1993 as "needlessly prioritiz[ing] passenger safety over commercial innovation".

:-)))

davesque · 2026-05-01T19:28:14 1777663694

There are two shows I still watch from start to finish every few years: The X-Files and Star Trek: TNG

mna_ · 2026-05-02T08:49:07 1777711747

For me, it's The X-Files, Buffy the Vampire Slayer, The Simpsons and Malcolm in the Middle. Those are the shows I watched as a kid and I'll love them forever.

dustfinger · 2026-05-01T19:40:22 1777664422

I think I have watched Star Trek: TNG all the way through 3 times with my kids already. They also love Deep Space 9.

davesque · 2026-05-01T19:41:57 1777664517

DS9 is also great. I don't go through it as often but it's definitely in rotation.

i_am_a_peasant · 2026-05-01T21:28:59 1777670939

I learned to love Sisko a lot more than Picard tbh. I find him much more relatable than some pretentious english-frenchman who gets weird around kids. I never understood why he’s such a jerk to Wesley in the beginning.

ryandrake · 2026-05-01T22:09:28 1777673368

Over the course of a few years, my wife and I did TNG, DS9, and VOY in order, back to back, without missing an episode. Such great TV.

fragmede · 2026-05-02T00:41:26 1777682486

And then there's Babalyon 5.

davesque · 2026-05-01T19:17:28 1777663048

Wow I've never disagreed harder with an HN comment in my entire life. Counting Crows were a single band that got some radio time in the early 90s. Calling them the big megastars during '96-'99 makes you sound like you weren't alive then. That statement just sounds so utterly ridiculous to me. It's like a narrow-minded European claiming that everyone in the US just eats nothing but hot dogs. Timeless albums from that era:

- Odeley (Beck '96)

- Aenima (Tool '96)

- OK Computer (Radiohead '97)

- Homogenic (Bjork '97)

- This is a long drive for someone with nothing to think about (Modest Mouse '96)

- Stankonia (Outkast '99)

- Kid A (Radiohead '00; began recording in Jan '99)

You were just rage baiting, right? The late 90s were an absolutely legendary time in popular music history.

Edit: Yes, agree with commenter who mentioned Underworld. Didn't mention it because it seemed more niche. But I adore Underworld.

davesque · 2026-04-30T19:44:22 1777578262

A lot of the comments here are reacting to the censorship aspect, which is obviously an important point. But the more interesting subtext to me is that I feel like this gives insight into the situation within the company. I'm assuming they wouldn't do something like this unless the recent load issues (mostly driven by OpenClaw usage) were seen as an existential threat. So I'm guessing that's how the leadership views their current situation. Between OpenClaw and their (probably inaccurate) capacity planning, they simply can't onboard any more consumer users. In other words, things are going to get worse before they get better. Anthropic has taken drastic measures because their service is about to implode.

The irony of course is that the way they've gone about reacting to this has damaged their brand so badly at the trust level that the public view of their company has completely flipped. They also seem strangely oblivious to this side of things.

Their approach has also been bizarrely chaotic. Banning then restoring OpenClaw usage. Removing Claude Code from the Pro plan, then re-enabling it and claiming it was an A/B test. Honestly my read is that Dario has a weak leadership style within the company where he either doesn't give enough specific guidance to his reports or overreaches with reactionary instructions.

ajam1507 · 2026-04-30T23:41:03 1777592463

> I'm assuming they wouldn't do something like this unless the recent load issues (mostly driven by OpenClaw usage) were seen as an existential threat.

I think another possibility is that they are trying to shift the burden of OpenClaw to their competitors.

tempaccount5050 · 2026-05-01T03:10:21 1777605021

I think this makes sense. I don't understand what problem OpenClaw is solving or what the use case is other than just burning a shit ton of tokens.

LtWorf · 2026-05-01T08:08:15 1777622895

That's all the industry.

hrimfaxi · 2026-05-01T12:39:26 1777639166

Openclaw is an always on AI assistant that's plugged into a bunch of MCPs. You don't understand what kinds of problems that can help solve and cant envision any use cases for that?

Ethee · 2026-05-01T18:32:33 1777660353

From a conceptual perspective it sounds great. The problem is that OpenClaw isn't actually a solution to that problem for 2 reasons, user expectation and underlying security. The majority of people I've talked to who want an 'AI assistant' effectively are expecting a proper executive assistant, just in AI form. A proper executive assistant will remember every important bit you tell them, they won't need to be reminded of it later, and more importantly they come to me of their own volition when something comes up. All things OpenClaw does not solve. Further, using MCP as the underlying protocol means you have to implicitly trust every piece of data you connect to that AI, because otherwise it's way too easy for me to send you an email with hidden instructions just for your AI to read. I mean even the defaults for the OpenClaw install had basically opened everyone who installed it and didn't configure it in any way to any attacker. So while I agree with you that there are problems in this space that an AI agent 'could' solve, OpenClaw does not currently solve any of them, and in fact does the opposite, exposing you and all your information easily.

bs7280 · 2026-05-01T14:39:01 1777646341

I think the important point in the parent comment is "Burning a shit ton of tokens". Openclaw was built fast and loose, making it use far too many tokens for trivial things. I'm confident the next Claw can and will be engineered to be at least 10x as token efficient and more reliable.

hrimfaxi · 2026-05-01T23:11:23 1777677083

Ah I didn't realize they meant openclaw literally. By now openclaw is the generic term for these integrated agents it seems.

tempaccount5050 · 2026-05-01T15:35:47 1777649747

Do you have some examples?

hrimfaxi · 2026-05-01T23:10:18 1777677018

Drafting email responses for work, organizing talking points for upcoming meetings based on email and doc context. Creating tickets for work tracking. Anything you can do with claude code and mcps pretty much.

tempaccount5050 · 2026-05-02T02:27:16 1777688836

None of those things require an always on token burner. I'm not trying to be rude, but do you think that's the only way to present relevant information to an LLM or something? It's literally the least efficient way to do it.

seattle_spring · 2026-04-30T20:23:17 1777580597

> The irony of course is that the way they've gone about reacting to this has damaged their brand so badly at the trust level that the public view of their company has completely flipped.

No one at my company gives a single shit about Openclaw, so this whole situation has been a noop for a lot more of the public than you seem to think.

Also, "censorship"? How is disallowing a specific tool that abuses a subscription "censorship"?

m4x · 2026-04-30T22:36:56 1777588616

No one at my company cares about OpenClaw either. We do care that we can be billed unexpectedly (either usage quota immediately being consumed, or being charged additional costs), generally with zero recourse, because a particular set of characters that Anthropic doesn't like appears somewhere in a repo.

This week the characters are "OpenClaw". I won't even try to guess what might lead to erroneous billing next week.

davesque · 2026-04-30T22:17:19 1777587439

I think the disallowing usage part was a great idea. I'd rather that Claude works well without getting DDOS'd. But merely mentioning OpenClaw causing session termination and extra charges? That's censorship. Also pretending not to know what OpenClaw is.

It's all just very weird and creepy.

pyridines · 2026-04-30T23:33:17 1777591997

'censorship' may be too strong a word, but there is something unprecedented about this. AI tools are supposed to be general-purpose and able to assist with all sorts of tasks. It's expected that they are restricted when it comes to "unsafe" content like illegal or nsfw information and activities. However, this is the first time, to my knowledge, that an AI tool has been restricted from assisting with something that's perceived as a threat to the AI company.

KingMob · 2026-05-01T11:55:12 1777636512

> this is the first time, to my knowledge, that an AI tool has been restricted from assisting with something that's perceived as a threat to the AI company

You think so? I was under the impression that all the model providers have been trying to prevent use of their models to train competitor models for a while now.

id00 · 2026-04-30T22:37:20 1777588640

> recent load issues (...) were seen as an existential threat

I wouldn't be so sure. Don't overestimate people competence.

For me it all looked like picking the highest ROI item in attempt to fix their reliability without putting too much thought how to do it gracefully. So they just hacked it and we see the results

MattRix · 2026-04-30T21:10:13 1777583413

Everything I’ve heard about the company tells me they are obsessed about exponential growth. It might seem bad to make a change that loses you 10% of your users, but if those are your least profitable users and the rest of your userbase is growing 200% per month, why does it matter?

AbstractH24 · 2026-05-01T11:22:58 1777634578

> The irony of course is that the way they've gone about reacting to this has damaged their brand so badly at the trust level that the public view of their company has completely flipped.

I you are overstating how much of their user base cares about OpenClaw. Not nearly as bad as the DoD was for OpenAI (particularly because that cut into a pattern of how Sam Altman acts in general)

But it is a reminder they are just another company

efromvt · 2026-05-01T14:22:21 1777645341

I don't the OpenClaw furor has been a problem for the majority; but stuff like the harness bugs with dropped thinking traces (capacity optimization?) and some fairly bizarre billing bugs with weird/opaque comms around both have been more concerning and affect a larger group than that loud minority. You do kind of want a reliable service with reliable billing and reasonable comms for most things at the corporate level.

AbstractH24 · 2026-05-01T14:58:53 1777647533

Particularly for CC I agree that’s getting increasingly infuriating.

I’m not sure where to turn next. I guess cursor?

Chyzwar · 2026-05-01T11:27:01 1777634821

All SOTA model providers are losing money. When users run Opus, they are essentially renting a GPU cluster worth half a million dollars for a $100/$200 subscription. If they want to IPO, they need to show a projection toward profit. For that reason, they want to discourage power users and attract normies.

energy123 · 2026-05-01T11:30:59 1777635059

> All SOTA model providers are losing money.

Source? I only read one article on this topic and they approximated gross margins at 50%.

> When users run Opus, they are essentially renting a GPU cluster worth half a million dollars for a $100/$200 subscription.

They use a large batch size, you're sharing the GPU with many other people.

Chyzwar · 2026-05-02T05:57:38 1777701458

Gross margin calculated on API token pricing with discounted training and hardware deprecation.

I am no so sure about batch sizes, chatgpt napkin calculation for 5T model show 10-300 sessions.

davesque · 2026-04-24T00:27:31 1776990451

Something discussed in the article, but absent in the diagnosis of the final paragraph: the unique phenomenon of Donald Trump in American life. To half the country (including me), he's the worst leader in the country's history. Even if a person has faith in humanity, they may still feel like they are swimming against the man's personal tide of anger and bad judgement. Things might be great if it weren't for the disastrous tariff policy and Iran war, which have needlessly crippled the economy. As long as a person like him is in charge, it feels like we're always taking two steps forward and twenty steps back, and for no good reason. To the other half, Trump rode in atop a wave of grievance, so even to those who like him, he's sold them the notion that their society is on the brink of collapse (because Democrats, leftists, etc.). Overall, the sum effect of his presence has been to shift the entire national culture (and even the world's culture) towards knee-jerk rage and resentment, in accord with his behavior and personality.

hunterpayne · 2026-04-24T04:05:00 1777003500

If Trump changed his name to Andrew, he wouldn't even be in the top 2 worst Presidents named Andrew. If you don't know history, you think everything is happening for the first time ever. You have probably never seen something truly unique happen in your lifetime but you don't know that because you don't know history.

PS There was once a major political party in the US literally called the "Know Nothings" and that name wasn't ironic.

defrost · 2026-04-24T04:13:04 1777003984

> There was once a major political party in the US literally called the "Know Nothings"

In the exact same sense as there is currently a major political party in the US literally called the "MAGA party".

ie. not literally and not actually, just colloquially.

davesque · 2026-04-23T19:09:13 1776971353

Then why did houses used to be affordable even in those dense regions with high paying jobs? People act as though housing has always been prohibitively expensive in city centers but it hasn't. My dad bought a house in Boulder, CO of all places easily in the 90s. And of course he made a killing off of it because the housing market went completely insane over the next two decades. I now make more money than he ever did and can't even dream of buying the same house.

Aurornis · 2026-04-23T21:36:56 1776980216

> Then why did houses used to be affordable even in those dense regions with high paying jobs?

Because those city centers have remained the same size while demand for living there continues to increase

More demand for a fixed set of land drives prices up.

Those city centers today are not equivalent to the same city centers 35 year ago.

xethos · 2026-04-24T01:02:58 1776992578

> More demand for a fixed set of land drives prices up.

This works because both you and GP specified "[free-standing] house". This is not true of homes, where multiple homes can occupy the same land - just 15 feet higher or lower

Perhaps someday more American cities will discover the third dimension, allowing for cheaper housing

_carbyau_ · 2026-04-24T01:49:43 1776995383

Don't get me wrong, there is a place for units/apartments, especially in the face of homelessness. But no one dreams of owning an apartment as opposed to a free-standing house.

The dream/desire is the thing.

ButlerianJihad · 2026-04-24T01:51:20 1776995480

https://www.musixmatch.com/lyrics/Weird-Al-Yankovic-2/Buy-Me...

  Gonna buy me a condo
  Gonna buy me a Cuisinart
  Get a wall-to-wall carpeting
  Get a wallet full o' credit cards
  I'm gonna buy me a condo, never have to mow the lawn
  I'm gonna get me da T-shirt wit' the alligator on

wan23 · 2026-04-24T19:49:45 1777060185

Why would you want to live in a free-standing house instead of a nice apartment given the choice? There are pros and cons sure, but unless you can hire someone to do all the house things I don't see it being a clear win.

kelnos · 2026-04-24T09:07:05 1777021625

> But no one dreams of owning an apartment as opposed to a free-standing house.

I think you might be a little out of touch. Plenty of people dream of owning any kind of real property.

_carbyau_ · 2026-04-26T23:16:06 1777245366

Mate, I am well aware of the struggle, I am living it too.

But we're talking dreams here. Imagination. Do people really feel the need to be frugal with their imagination of what they desire?

Do people really think "Gosh, what I could do with a billion dollars.... no wait, I need to conserve my brain energy, my imagination is getting too expensive, better make that tree fiddy." ?

diogenescynic · 2026-04-24T02:16:20 1776996980

I think you're focusing on the wrong thing and missing the point. Housing supplies have not significantly increased with population growth (demand) in decades--thus the price equilibrium has moved up. I don't care if you build up or out and neither does the law of supply and demand. The left gets all hung up on 'the right kind of housing' and doesn't realize they're part of the problem--making it harder to build housing (of any kind) is pushing housing costs up.

WarmWash · 2026-04-24T02:33:52 1776998032

Just to take it one step further, there are usually geographical reasons why cities are located where they are.

So you also can't just build a new city in central Nebraska and have everyone move there for cheap.

This is besides the entrenchment that happens when industry is in one place for a long time.

thereisnospork · 2026-04-23T19:17:08 1776971828

Because the regulations, set by those with vested interest in real estate, make it difficult to build more housing. Otherwise anyone with any sense would undercut the existing housing stock and turn a 100k investment in concrete and timber into a million dollar home in Boulder, CO.

Not exactly rocket science - if there's money to be made and people aren't making it then something is stopping them.

yason · 2026-04-23T19:23:05 1776972185

It's a generational narrative here as well: while it gets applied to X, Y, or Z generations in turn and depending on the context - I think it started with X's - but the gist of it is that young generations couldn't afford the house they themselves grow up in. Even if their parents were basic blue collar families and the new generation are well educated. There's too much truth in that as people look back in the preceding decades.

davesque · 2026-04-23T19:47:53 1776973673

This wasn't some kind of mansion. It was a 1300 square foot house. I guess I'm aiming too high then while making 4x his salary? And people have been whining about this same problem for decades so nothing to be done about it?

9x39 · 2026-04-24T00:40:15 1776991215

Depends if you think you’re going to ride a rising tide of appreciation when you buy a house, or if you have to accept its already long passed.

Aunts and uncles picked up homes in SoCal for 150-200k in the 90s, now worth 1-2m in some cases, but in any case, it seems unreplicable today.

If there’s a new frontier to capitalize on, a lot of us seem to be missing it…

scottyah · 2026-04-23T23:46:18 1776987978

well shoot, his grandpappy just had to roll up and Stake his claim on the land and it was his.

listenallyall · 2026-04-23T21:32:40 1776979960

Supply and demand. Among many other changes, the demographics of the typical Boulder resident changed significantly - originally nature lovers and hippies for whom earning money was not a primary motivation - post-2000 shifted to educated, highly-compensated desk workers who can bid up prices. And lots more people in total seeking to live in a small area, which also lifts prices significantly.

satvikpendem · 2026-04-24T06:32:15 1777012335

Zoning laws is why. No one wants new development because it could devalue their own house.

HDThoreaun · 2026-04-23T19:46:59 1776973619

America is new. Even in the 90s boulder was largely empty, competition for land was low, so land was cheap. As people spread to newer cities and gained wealth they bid up the price on land.

davesque · 2026-04-23T19:52:38 1776973958

> Even in the 90s boulder was largely empty

Uh, no it wasn't? I was living there and continued living there for the next 30 years. It always felt about as dense to me as it did back then.

decimalenough · 2026-04-23T20:43:41 1776977021

Even today Boulder is "largely empty". It's an overgrown village and not a city, and planning rules ensure it will stay that way.

HDThoreaun · 2026-04-23T21:13:38 1776978818

>It always felt about as dense to me as it did back then.

This is why its so expensive. Demand for housing has increased but supply has not. The government refusing to allow densification in the face of increased demand means prices skyrocket

mothballed · 2026-04-23T19:51:58 1776973918

Still plenty of cheap land in CO, but they made drilling a well a nightmare in many cases. So people wanting to use cheap land either have to haul water or do some kind of low-key wildcat drilling.

davesque · 2026-04-16T21:45:04 1776375904

> We stated that we would keep Claude Mythos Preview’s release limited and test new cyber safeguards on less capable models first. Opus 4.7 is the first such model: its cyber capabilities are not as advanced as those of Mythos Preview (indeed, during its training we experimented with efforts to differentially reduce these capabilities). We are releasing Opus 4.7 with safeguards that automatically detect and block requests that indicate prohibited or high-risk cybersecurity uses.

It feels like this is a losing strategy. Claude should be developing secure software and also properly advising on how to do so. The goals of censoring cyber security knowledge and also enabling the development of secure software are fundamentally in conflict. Also, unless all AI vendors take this approach, it's not going to have much of an effect in the world in general. Seems pretty naive of them to see this as a viable strategy. I think they're going to have to give up on this eventually.

andai · 2026-04-16T22:23:28 1776378208

The fundamental tension is that the models are getting weirdly good at hacking while still sort of sucking at a bunch of economically valuable tasks.

So they've hit the point where the models are simultaneously too smart (dangerous hacking abilities) and too stupid (can't actually replace most employees). So at this point they need to make the models bigger, but they're already too big.

So the only thing left to do is to make them selectively stupider. I didn't think that would be possible, but it seems like they're already working on that.

kadushka · 2026-04-16T22:53:29 1776380009

models are getting weirdly good at hacking while still sort of sucking at a bunch of economically valuable tasks

like most human hackers

andrewstuart2 · 2026-04-16T23:13:57 1776381237

Honestly I feel sometimes like about the only thing they do successfully is hacking. Not just in the sense of breaking into systems that are assumed to be secure although also in that sense. They're just, highly effective at fumbling around with a hatchet until something works. We just happen to have version control and automated testing that generally makes that approach somewhat viable for the task of programming. But while I've been genuinely impressed at how much it can put features into a workable state, I've never been confident looking at its output that it's going to do more than POC quality at the current state of things. But it's pretty dang effective at that given enough time and a space safe to hack away and reset until the product looks close enough.

andai · 2026-04-17T16:21:07 1776442867

"Genius is but the capacity to take infinite pains."

andrewstuart2 · 2026-04-17T17:09:15 1776445755

You know, that's also true. I am where I am because I'm stubborn AF and just keep hacking on things until they work. Maybe one of the biggest differences is just ego, lol.

weitendorf · 2026-04-17T06:16:17 1776406577

They are training them on decompilation and reverse engineering/blackbox reimplementations/pentesting because it’s one of the best ways to generate interesting and rare RL traces for agentic coding AND teach them how lots of things work under the hood.

Just throw Claude at millions of binaries and you can get amazing training data. Oh wait 4.7 gives you refusals for that now

weitendorf · 2026-04-17T06:12:47 1776406367

This is a price discrimination/upsell strategy. Sure, if you just want software, use our public model. Don’t worry; it’s safe.

But if you want your model to be secure, and you want to deal with dangerous stuff, contact us for pricing. BTW if you don’t pay for us to pentest you, maybe someone else will, idk.

Oh also you’re not allowed to pentest yourself with our public models anymore because it looks like hacking

SJMG · 2026-04-17T02:08:16 1776391696

Yes, it's a losing strategy; no one else is going to do this. They are inviting parties to partner with them, so it's not totally in conflict, but yeah I'm sure there's genuine concern coming out of Anthropic, but I also think as this point they've likely culturally internalized "Dangerous [think: powerful] AI" as a brand narrative.

"The Beware of Mythos!" reads to me as standard Anthropic/Dario copy. Is it more true now than it was before? Sure. Is now the moment that the world's digital infrastructure succumbs to waves of hackers using countless exploits; I doubt it.

mk89 · 2026-04-17T05:34:54 1776404094

>Is now the moment that the world's digital infrastructure succumbs to waves of hackers using countless exploits; I doubt it.

I am not into cybersecurity but the existing "technical debt" in terms of security has been barely exploited.

The issue is that literally all software has some vulnerability, want it or not. And these LLMs are like brute forcing all possibilities faster than a human can do. Sometimes humans even ignore low security issues, while maybe these LLMs are capable to build exploits on top of multiple ones.

For me they understood the moat - cybersecurity is such a trivial space to get into, I guess they are investing heavily on that because as someone else mentioned in other threads, it's obvious they are too limited for other tasks.

Becoming a "mandatory" (SOC-2 etc, things like that) integrated part of your CI/CD pipeline would be a huge win for them. Imagine that.

Jagerbizzle · 2026-04-17T00:18:55 1776385135

This is the company that allowed a vibe-release resulting in the leaking the entirety of the Claude Code codebase. What is the bar you're expecting here exactly?

earthnail · 2026-04-16T21:50:01 1776376201

I feel it’s fine as a short term solution, and probably a good thing. Gives the good guys some time to stay on top.

Always remember: a defender must succeed every time , an attacker only once.

jacobsenscott · 2026-04-16T22:57:52 1776380272

Given the list of very large companies in the "glasswing" project - it is likely every competent state actor and criminal organization already has access to Mythos in one way or another. Meanwhile the opensource volunteers responsible for the security of the entire internet don't have access.

earthnail · 2026-04-17T09:20:53 1776417653

It's not an easy problem to solve. You can identify certain open source projects that you deem critical and give them access too in a private fashion (maybe even under NDA). Not every state actor will have early access; Russia and the Chinese surely won't, and that matters in current affairs. It's probably only the US gvmt, not even European allies, who currently can use Mythos. The announcement specifically says "Anthropic has also been in ongoing discussions with US government officials about Claude Mythos Preview".

There is no good solution to this. Only less bad. It annoys me a bit that many comments on HN imply that open-sourcing everything right away is the answer to everything. To be clear, I'm not annoyed at your comment specifically, it's more an overall sentiment that I perceive here that I feel is very complacent. We've already seen how OSS maintainers get overwhelmed by AI vulnerability reports; I feel it's a responsible thing to gatekeep this for as long as possible (which really is only a few months, at most - other models catch up fast), and try to work with important maintainers directly to help fix the most critical stuff and onboard them to a new world of the AI-assisted cat-and-mouse security game.

This is just damage control. The damage, i.e. the attack capabilities opened up by this, is pretty brutal, and likely requires a substantial shift in mindset from OSS maintainers. This approach gives a few months of transition time. Who decides who is an important maintainer and who isn't? Again, super grey area; there's no time to decide on a proper process given how fast other models will catch up, so realistically you can just do a bit of a best effort here and try to not botch it up entirely. Anthropic went with the Linux foundation here. It's a reasonable choice. Not a perfect one, but you gotta start somewhere.

davesque · 2026-04-16T22:17:44 1776377864

So then why expect that you're making the world safer by limiting the capability that your vendor locked customers have access to while attackers will go find the best de-censored model that works for them, wherever they can find it?

cesarvarela · 2026-04-17T01:30:12 1776389412

Yeah, it is easier to destroy than to create. Models will always be better at hacking than at building.

zmmmmm · 2026-04-16T23:40:21 1776382821

Curious how the safeguards work and what impact they will have.

In general I feel that over-engineering safeguards in training comes at a noticeable cost to general intelligence. Like asking someone to solve a problem on a white board in a job interview. In that situation, the stress slices off at least 10% of my IQ.

shohan99 · 2026-04-17T05:25:57 1776403557

While I believe that mythos is better than the models we have right now, the "too dangerous to release" sounds largely a marketing gimmick to me. Well not for me to speculate, I simply need to wait for the huge wave of security patches to all software in the coming weeks, as per Anthropic's claims

willis936 · 2026-04-17T00:08:13 1776384493

I'm not a security expert and don't know how to properly audit every github repo that I come across. Maybe I sometimes want to build gnome extensions or cool software projects from source and I want some level of checking along the way for known vulnerabilities. They can't claim this is an obvious win for security when it centralizes rather than democratizes security.

slashdave · 2026-04-17T00:23:52 1776385432

I interpreted their actions as providing time for vendors to protect themselves against the new model proactively, not to nerf the models themselves.

Although perhaps I am naive.

davesque · 2026-04-14T01:01:54 1776128514

Am I reading the article wrong? It appears that the author did not test the claims of the proof. Wouldn't a "bug" in this case mean she found an input that did not survive a round trip through the compression algorithm?

Update: Actually, I guess this may have been her point: "The two bugs that were found both sat outside the boundary of what the proofs cover." So then I guess the title might be a bit click baity.

gopiandcode · 2026-04-14T01:16:00 1776129360

Hi! Author here. When we speak of bugs in a verified software system, I think it's fair to consider the entire binary a fair target.

If a buffer overflow causes the system to be exploited and all your bitcoins to be stolen, I don't think the fact that the bug being in the language runtime is going to be much consolation. Especially if the software you were running was advertised as formally verified as free of bugs.

Secondly, I did find a bug in the algorithm. in Archive.lean, in the parsing of the compressed archive headers. That was the crashing input.

quantummagic · 2026-04-14T01:21:21 1776129681

> I think it's fair to consider the entire binary a fair target.

Yes, it's still very much a bug. But it has nothing to do with your program being formally verified or not. Formal verification can do nothing about any unverified code you rely on. You would really need a formal verification of every piece of hardware, the operating system, the runtime, and your application code. Short of that, nobody should expect formal verification to ensure there are no bugs.

appplication · 2026-04-14T01:58:43 1776131923

I read it as that’s also the point. Adding formal verification is not a strict defense against bugs. It is in a way similar to having 100% test coverage and finding bugs in your untested edge cases.

I don’t think the author is attempting to decry formal verification, but I think it a good message in the article everyone should keep in mind that safety is a larger, whole system process and bugs live in the cracks and interfaces.

quantummagic · 2026-04-14T03:01:50 1776135710

You're right. It just seems as though it should be self-evident. Especially to those sophisticated enough to understand and employ formal verification.

gopiandcode · 2026-04-14T03:33:54 1776137634

It does seem that way doesn't it? But as software bugs are becoming easier to find and exploit, I'm expecting more and more people, including those not "sophisticated enough" to understand and employ formal verification to start using it

quantummagic · 2026-04-14T04:33:29 1776141209

> I'm expecting more and more people

Then it would help to not introduce any confusion into the ecosystem by using a click-baity title that implies you found a bug which violated the formal specification.

sn9 · 2026-04-14T22:43:01 1776206581

We should not cater to people who make decisions based on titles instead of reading the actual article.

quantummagic · 2026-04-15T02:21:00 1776219660

That's a shitty rationale for click-bait titles. Good titles are for the benefit of people who actually read the articles too.

davesque · 2026-04-14T02:12:18 1776132738

Thanks for responding!

> When we speak of bugs in a verified software system, I think it's fair to consider the entire binary a fair target.

Yeah, I would actually agree. We wouldn't want to advertise that a system is formally verified in some way if that creates a false sense of security. I was just pointing out that, by my reading, the title appears to suggest that the core mechanism of the Lean proof is somehow flawed. When I read the title, I immediately thought, "Oooh. Looks like someone demonstrated a flaw in the proof. Neat." But that's not what is shown in the article. Just feels a bit misleading is all.

NewsaHackO · 2026-04-14T23:30:08 1776209408

OK, so now I see the shadow edit you did for the code source, thanks. Unfortunately, it shows that you are incorrect. For one, the function is a private function and can only be called by local code. Everywhere that the function is called, the size given to it is verified by the program; there is even a note that says it limits the maximum zip file size to avoid a zip bomb. In addition, the code you are quoting isn't even the final code; it is an interim step from what Claude was iterating on. Sucks that this got so much traction, as you are purposely being deceptive in trying to say that this is a bug. You intentionally removed the 'private' keyword in the function signature, as you knew that it would tip off most people to then check when it is actually used.

tarasglek · 2026-04-14T05:43:51 1776145431

sorry to hijack the thread. Really cool post. How long did the whole exercise including porting zlib to lean take?

i have a hard real time system that i would love to try this on, but that's a lot of tools to learn and unclear how to model distributed systems in lean.

also, please add rss so i could subscribe to your blog

gopiandcode · 2026-04-14T08:51:56 1776156716

Lean-zip was not my project but one by others in the lean community. I'm not sure about the methodological details of their process - you might want to check with the original lean-zip authors (https://github.com/kim-em/lean-zip)

NewsaHackO · 2026-04-14T16:17:48 1776183468

I notice you didn't put a code reference for the second bug. Where is the code exactly?