More

gizmodo59 · 2026-03-08T12:53:17 1772974397

Unfortunately the paper doesn’t include gpt 5.3 which was released around the same time as opus 4.6 and also gpt 5.4 few days back. Both are available via api

https://developers.openai.com/api/docs/models/gpt-5.3-codex

IMHO The harness must be used when running these experiments. The model vendors know best on giving the best harness with gpt 5.4 and codex or Claude code with opus 4.6 which makes a big difference if you are running any kind of agentic coding tasks.

I see both Claude and gpt to be neck and neck in coding. Every other model+harness is definitely 3-6 months behind. Right now codex seems to be the best in terms of solving complex bugs, long running tasks, much higher limits and even speed while Claude seems to do well in front end and their cli ux seems nice! Codex app is very good though (wish it wasn’t electron as a memory hog but it’s good)

jasonjmcghee · 2026-03-08T15:51:55 1772985115

> model vendors know best on giving the best harness

This was only true for Claude Code for a while. Codex was poor and Gemini was unusable.

Since then Codex has gotten quite good.

jsemrau · 2026-03-09T03:58:03 1773028683

It still fubars my code regularly at 11x the price. Github Copilot Agentic Mode + Sonnet 4.6 is stable and inexpensive.

p1esk · 2026-03-08T13:07:23 1772975243

Are you saying they did not use native harnesses like Claude Code or Codex? How did they do it then?

gizmodo59 · 2026-03-06T00:04:18 1772755458

ChatGPT has given more for my 20$ than any other vendor. And that’s not even considering codex which is so good and the limits are much much higher

manojlds · 2026-03-06T03:15:03 1772766903

How is that relevant? Also, when you are behind you do give more usage

triage8004 · 2026-03-06T01:52:13 1772761933

They are all losing money on probably all levels of the packages if you max them out

bwat49 · 2026-03-06T01:19:55 1772759995

yeah claude is great... but only if you pay $100-$200 a month

beefsack · 2026-03-06T03:27:29 1772767649

Many people buy two separate Claude pro subscriptions and that makes the limit become a non-issue. It works surprisingly well when you tend to hit the 5 hourly limit after a few hours, and hit the weekly limit after 4-5 days. $40 vs $100 is significant for a lot of people.

ruszki · 2026-03-06T09:39:37 1772789977

I hit limit of Pro in about 30 minutes, 1 hour max. And only when I use a single session, and when I don't use it extensively, ie waits for my responses, and I read and really understand what it wants, what it does. That's still just 1-2 hours/5 hours.

What do you do to avoid that?

AlexeyBelov · 2026-03-06T10:26:15 1772792775

You're probably having long sessions, i.e. repeated back-and-forth in one conversation. Also check if you pollute context with unneeded info. It can be a problem with large and/or not well structured codebases.

ruszki · 2026-03-06T11:43:52 1772797432

The last time I used pro, it was a brand new Python rest service with about 2000 lines generated, which was solely generated during the session. So how I say to Claude that use less context, when there was 0 at the beginning, just my prompt?

nevertoolate · 2026-03-06T12:38:15 1772800695

So you had generated 2000 lines in 30 minutes and ran out of tokens? What was your prompt?

I’d use a fast model to create a minimal scaffold like gemini fast.

I’d create strict specs using a separate codex or claude subscription to have a generous remaining coding window and would start implementation + some high level tests feature by feature. Running out in 60 minutes is harder if you validate work. Running out in two hours for me is also hard as I keep breaks. With two subs you should be fine for a solid workday of well designed and reviewed system. If you use coderabbit or a separate review tool and feed back the reviews it is again something which doesn’t burn tokens so fast unless fully autonomous.

smartbit · 2026-03-06T04:03:40 1772769820

Thanks for the tip, didn’t think of using 2 subscriptions at the same company.

When reaching a limits, I switch to GLM 4.7 as part of a subscription GLM Coding Lite offered end 2025 $28/year. Also use it for compaction and the like to save tokens.

devld · 2026-03-06T15:20:20 1772810420

I'm using it via Copilot, now considering to also try Open Code (with Copilot license). I don't know if it's as good as Claude Code, but it's pretty good. You get 100 Sonnet requests or 33 Opus request in the subscription per month ($20 business plan) + some less powerful models have no limits (i.e. GPT 4.1), while extra Sonnet request is $0.04 and Opus $0.12, so another $20 buys 250 Sonnet requests + 83 Opus requests. This works for me better since I do not code all day, every single day. Also a request is a request, so it does not matter if it's just a plain edit task or an agent request, it costs the same.

Btw. I trust Microsoft / GitHub to not train on my data more (with the Business license) than I would trust Antrophic.

nerdsniper · 2026-03-06T04:06:32 1772769992

To be honest it feels very worth my $200/mo. And I “only” make $80k/year. I used to have two ChatGPT subs but Claude is just so much better.

gizmodo59 · 2026-03-02T00:12:09 1772410329

It’s disgusting how they have successfully fooled people into thinking they are the good guys. They partnered with palantir, let them freely do the dirty work and once they realized they can make money directly they spin the PR and just trying to get more users. Well played.

I wish oss models are good so that we don’t have to deal with either leading companies!

gizmodo59 · 2026-02-26T23:00:51 1772146851

They are playing a good PR game for sure. Their recent track record doesn’t show if they can be trusted. Few millions is nothing for their current revenue and saying they sacrificed is a big stretch here.

IG_Semmelweiss · 2026-02-26T23:03:04 1772146984

Yes, but also remember where they came from.

They don't have any brand poison, unlike nearly everyone else competing with them. Some serious negative equity in tha group, be it GOOG, Grok , META, OpenAI, M$FT, deepseek, etc.

Claude was just being the little bot that could, and until now, flying under the radar

reasonableklout · 2026-02-27T05:45:37 1772171137

It's much more than a few million? Being declared a supply chain risk means that no company that wants to do business with the government can buy Anthropic. And no company that wants to do business with those businesses can buy Anthropic either. This rules out pretty much all American corporations as customers?

gizmodo59 · 2026-02-26T15:46:47 1772120807

That’s their excuse to still appeal to people who can be tricked with their safety first pitch. It’s easy to have constitution and all the crap when you are not battle tested. They just showed their true colors.

gizmodo59 · 2026-02-26T12:36:31 1772109391

If you haven’t used codex with gpt-5.3-codex (high or xhigh) you are missing out. Claude is still good at conversations but boy I can have codex go at a problem and it does better than Claude almost all the time. Front end and product UX Claude is slightly better but given the very very generous limits of codex, they are the best bang for buck

coreyburnsdev · 2026-02-26T15:11:03 1772118663

this is my experience as well, just cancelled my claude subscription as I'm tired of it the 5 hour window being filled up within 30 minutes of use, and not even fixing the problem that codex finds almost immediately. also found for frontend that gemini 3.1 pro is better than the rest if you really play with it.

Imanari · 2026-02-26T20:06:20 1772136380

How is gemini 3.1 doing in agentic harnesses? Did they catch up?

cmrdporcupine · 2026-02-26T14:43:15 1772116995

Absolutely this. Codex right now is the superior coding model.

riskable · 2026-02-26T17:12:33 1772125953

Has it been sped up at all? Last time I used codex (which was with 5.1 I think), it was pretty slow. I mean, it did a fantastic job at figuring out hard bugs across multiple languages ("why is this image not lining up in this server-rendered template?"; Python, JS, CSS, and the template lang) but it took quite a long time. Long enough that I wouldn't want to use it for anything but the most complex things.

gizmodo59 · 2026-02-26T22:58:32 1772146712

5.1 is 100 years old in AI world.

gizmodo59 · 2026-02-25T13:11:50 1772025110

At this point if one lab comes up with a feature it’s a matter of time before another does the same!

gizmodo59 · 2026-02-13T12:30:46 1770985846

Codex 5.3 is hands down the best model for coding as of today

gizmodo59 · 2026-02-13T12:30:03 1770985803

That’s gpt-5.3-codex released last week

gizmodo59 · 2026-02-07T12:25:07 1770467107

It’s not a crime if you do something for money. Those who comment are likely doing the same and they couldn’t get into a company like OpenAI and hence the hatred! Keep doing the great work you always did! Excited to see what you ll do with all the resources in the world.