More

enraged_camel · 2026-06-19T00:48:15 1781830095

That wasn't what brought this change: https://news.ycombinator.com/item?id=48593357

enraged_camel · 2026-06-18T03:10:54 1781752254

Yes, my experience has been the same as yours. I find that the performance of open models is quite acceptable, even good, at one-off questions or small tasks. But they are quite unreliable at long horizon goals.

enraged_camel · 2026-06-17T20:14:15 1781727255

>> I'm probably in the minority, but I do not want a "connection" with a business. I want transactional interactions that actually work.

I do want a connection. Because connection is what ensures that the transactional interactions continue to work outside of the "happy path". Connection is what ensures that you can return those expensive headphones you bought because extended use makes your neck hurt, even though the return window has passed.

enraged_camel · 2026-06-17T17:36:30 1781717790

Okay, so why are other models not banned too? This "jailbreak" works for them as well.

soerxpso · 2026-06-18T01:17:03 1781745423

Anthropic has alleged that this model is much more dangerous than other currently available models. Their CEO has said so publicly multiple times. It's like asking why cesium isn't banned if nuclear missiles are banned.

(whether Mythos is actually that dangerous is beside the point; considering that Anthropic claims that it is, it makes sense to regulate it)

tiahura · 2026-06-17T20:45:07 1781729107

How would you know? Amazon and the NSA thought it was a problem, but you know better?

enraged_camel · 2026-06-17T22:52:29 1781736749

Know what better? They told us what the "jailbreak" entails.

enraged_camel · 2026-06-17T17:29:44 1781717384

>> I can’t rely on using a technology that the US administration can ban at will.

And you think China will not do the same thing if their models ever become genuinely frontier-level?

khalic · 2026-06-17T17:45:16 1781718316

Then the US will publish their own open weights to outmanoeuvre china.

What’s intolerable is having a tool that’s subject to this risk.

So open models it is

enraged_camel · 2026-06-17T16:02:40 1781712160

Complete strategic defeat and capitulation by the United States. This all but ensures Iran will become the dominant regional power in about a decade, maybe less.

enraged_camel · 2026-06-17T13:44:07 1781703847

People always say stuff like this, but it is misleading. The reason it's misleading is because that remaining 5% makes a huge difference, and is where most of the value of using AI agents lies.

I'm not interested in using AI to write code that would have taken me 5-10 minutes to write myself. I use AI to debug complex bugs and develop large features that span multiple domains - stuff that normally takes hours, if not days/weeks. A model that is "enough for 95%" does not cut it for that, because the failures compound during long-horizon tasks and the thing becomes a mess.

sinuhe69 · 2026-06-17T17:13:45 1781716425

I get what you mean. But for many people, AI coding is not about solving complex problems. No, they do it mostly themselves. AI coding for many is a productivity tool, where it helps you with mundane, but laborious tasks.

In my setup, I use a daily workhorse for such things. They should be fast, cheap and reasonably working well. I don’t expect it to be smart, but need it to follow instructions perfectly and handle tool calling well.

For architectural work or debugging help, I use the top models instead.

That works reasonably well for me with a low cost.

enraged_camel · 2026-06-17T02:50:49 1781664649

That must be why Trump spent over $50B bombing Iran and agreed to pay them several hundred billion to go back to the status quo.

enraged_camel · 2026-06-16T17:54:59 1781632499

I'm not sure about that. Claude has some bugs, but Codex is not as polished and doesn't have as many features. For example, you need to add MCP servers manually. There's no Plugin/Skill/Connector marketplace that is accessible from within the app, like there is with Claude Desktop. The Cowork-equivalent is nowhere as powerful. And so on.

I still use Codex, but mostly when I need to check Opus 4.8's work. Pretty sure I will stop doing that soon, because during the short time Fable was available, Codex was not able to find any important issues with the code Fable wrote.

nostrebored · 2026-06-16T18:19:04 1781633944

But how many plugins are people actually using? I can think of one MCP server I find valuable (context7) and one plugin that i've installed, but continuously think about uninstalling (obra/superpowers).

Both were trivial to set up with codex.

ai_slop_hater · 2026-06-16T18:58:30 1781636310

It's a good thing. I hate MCPs from the bottom of my heart because they always stay there and bloat the context window. Also, usually developers who develop them don't know what they're doing, so the MCP responses also bloat your context even further.

wxw · 2026-06-16T18:05:21 1781633121

There are plugins in the app.

Haven’t tried Cowork, interesting. Isn’t it just the same agent minus the git worktree based UI?

Frankly, neither Claude nor Codex are as good as hype entails.

cute_boi · 2026-06-16T18:55:11 1781636111

i think codex is much better in that aspect. In claude there is skills, connector, capabilities and 4 places for browser... It is too much.

antupis · 2026-06-16T18:01:38 1781632898

Personally I prefer GPT 5.5 writing style over Opus 4.8. It’s much more no nonsense and information denser.

sunaookami · 2026-06-16T18:05:15 1781633115

That's the first time I saw someone prefering GPT-styled output over Claude ;) It's the complete opposite for me, GPT is way too verbose (even after telling it to STFU), overwhelms the user with thousands of options and doesn't just answer a question without shitting out thousands of paragraphs. Also the overall tone is way too enthusiastic.

nostrebored · 2026-06-16T18:20:05 1781634005

I strongly prefer codex. Claude is annoying. Codex provides descriptions where I want them and more touchpoints to audit the quality of work. Claude code on experimental seems to not even show diffs when asked anymore, and it's much less clear what is being shipped.

orphea · 2026-06-16T18:18:23 1781633903

Dunno, I prefer GPT 5.5 too for the same reasons as the parent. Extremely subjective but had better results with it too. Maybe I just got unlucky with Claude a few times, but even the latest Opus was dumb.

black_knight · 2026-06-16T20:40:31 1781642431

Fascinating how people have such complete diametrically opposed experiences. I guess both models have it in them to behave very differently in different circumstances and we have very little idea what pushes them in this or that direction. I guess it does boil down to luck!

Personally, Claude Opus (and in the few interactions I had with it, Fable) has been the far the superior experience. GPT-5.5 seems dumber and more certain about presenting me bullshit. Opus has better humor, and is less pretentious in its presentation. But this may all boil down to how the models react to my prompting.

What is without a doubt is that I wish they both were more intelligent – or maybe it is their wisdom I find lacking!

antupis · 2026-06-18T05:01:23 1781758883

It might be harness or prompting style. Personally I use opencode and my prompting style is very plain and terse . Where tasks are very small. Opus and Sonet too often are too verbose and go tangent. Where GPT5.5 is much stricter.

vmg12 · 2026-06-16T18:47:04 1781635624

> For example, you need to add MCP servers manually. There's no Plugin/Skill/Connector marketplace that is accessible from within the app

This is all wrong.

enraged_camel · 2026-06-14T21:15:57 1781471757

>> A true, but vapid speech.

PG got into an argument with AOC about it on Twitter. It sounded like he was personally offended by what she was saying. Which makes sense because, as someone who has helped startup founders become famously wealthy, he probably took her statement as an attack on his identity.

Perhaps PG should follow his own advice, though: https://paulgraham.com/identity.html