More

minimaxir · 2026-04-25T04:46:19 1777092379

...I give up understanding multipliers now.

There's an obvious subtext that Copilot will be trying phasing out all 1x premium multipliers in order to actually make money off of it.

deaux · 2026-04-25T07:16:06 1777101366

Yep. Wonder how long they'll wait with removing the 1x models, especially Gemini if it takes a long time, though they might just grab at an overdue reclassification of "preview" to "GA" to push it through.

minimaxir · 2026-04-24T20:50:59 1777063859

I use both CC and Codex because one is not enough and 5x for $100 is too much.

minimaxir · 2026-04-24T17:00:16 1777050016

To an extent. That economic incentive stops making sense when a) capacity is an actual constraint and b) Anthropic is not a monopoly and is subject to pressure from competitors who are more user-friendly.

minimaxir · 2026-04-24T16:36:14 1777048574

These changes fixed some of the token issues, but the token bloat is an intrinsic problem to the model, and Anthropic's solution of defaulting to xhigh reasoning for Opus 4.7 just means you'll go through tokens faster anyways.

sgt · 2026-04-24T16:39:39 1777048779

I'm worried anything less than xhigh is insufficient though. What do you do?

minimaxir · 2026-04-23T22:02:34 1776981754

Golden Gate Claude was two years ago and it's surprising there hasn't been as much research into targeted activations since.

landl0rd · 2026-04-24T01:07:49 1776992869

There’s been some, but naive activation steering makes models dumber pretty reliably and training an SAE is a pretty heavy lift.

minimaxir · 2026-04-23T19:17:11 1776971831

Direct link to section: https://code.claude.com/docs/en/errors#claude-opus-is-not-av...

minimaxir · 2026-04-23T19:11:08 1776971468

Images 2.0 is already in ChatGPT.

johndough · 2026-04-23T20:43:24 1776977004

When I generate an image with ChatGPT, is there a way for me to tell which image generation model has been used?

minimaxir · 2026-04-23T21:25:30 1776979530

There's no explicit flag, but Thinking is only compatable with Images 2.0 so I suspect that will be reliable.

Grp1 · 2026-04-23T19:26:17 1776972377

Great, thanks for clarifying :)

minimaxir · 2026-04-23T18:15:41 1776968141

Competition.

steinvakt2 · 2026-04-24T07:54:26 1777017266

Can't be just that. There was competition in the GPT-4 era. But we didn't get model drops every month.

pixel_popping · 2026-04-23T18:19:12 1776968352

This is frankly exciting, outside of the politics of it all, it always feel great to wake up and a new model being released, I personally will stay awake quite long tonight if GPT-5.5 drop in codex.

apical_dendrite · 2026-04-24T02:23:06 1776997386

I don't find it exciting at all. I just feel anxiety about my career and my place in the world. I have a set of skills that I've developed over many years. I care about what I create. I consider it a craft. When I use my skills to solve a hard problem, I feel good about myself. When the AI does the work for me, I don't get that sense of accomplishment. I am seeing my value evaporate before my eyes.

I hate this stuff and I wish it had never been invented.

pixel_popping · 2026-04-24T10:39:02 1777027142

You might want to rethink this, think of this as the opportunity of a lifetime, the beginning of a new era, the same as the early Internet, where you do have the chance to set yourself for life now, this window is getting shorter and shorter, but you can't deny that you do have the potential NOW to thrive or start multiple businesses without much capital. Think also that the best thing in the end, is probably to build great things, regardless on how we build them, making the world progress.

minimaxir · 2026-04-23T18:08:19 1776967699

The more interesting part of the announcement than "it's better at benchmarks":

> To better utilize GPUs, Codex analyzed weeks’ worth of production traffic patterns and wrote custom heuristic algorithms to optimally partition and balance work. The effort had an outsized impact, increasing token generation speeds by over 20%.

The ability for agentic LLMs to improve computational efficiency/speed is a highly impactful domain I wish was more tested than with benchmarks. From my experience Opus is still much better than GPT/Codex in this aspect, but given that OpenAI is getting material gains out of this type of performancemaxxing and they have an increasing incentive to continue doing so given cost/capacity issues, I wonder if OpenAI will continue optimizing for it.

xiphias2 · 2026-04-23T18:41:26 1776969686

There's already KernelBench which tests CUDA kernel optimizations.

On the other hand all companies know that optimizing their own infrastructure / models is the critical path for ,,winning'' against the competition, so you can bet they are serious about it.

dash2 · 2026-04-25T00:16:09 1777076169

Is that true? I would have guessed research breakthroughs might be a more plausible way to win.

xtracto · 2026-04-23T23:43:07 1776987787

So, im working in some high performance data processing in Rust. I had hit some performance walls, and needed to improve in the 100x or more scale.

I remembered the famous FizzBuzz Intel codegolf optimizations, and gave it to gemini pro, along with my code and instructions to "suggest optimizations similar to those, maybe not so low level, but clever" and it's suggestions were veerry cool.

LLM do not stop amazing me every day.

amrrs · 2026-04-23T18:12:36 1776967956

Honestly the problem with these is how empirical it is, how someone can reproduce this? I love when Labs go beyond traditional benchies like MMLU and friends but these kind of statements don't help much either - unless it's a proper controlled study!

minimaxir · 2026-04-23T18:21:50 1776968510

In a sense it's better than a benchmark: it's a practical, real-world, highly quantifiable improvement assuming there are no quality regressions and passes all test cases. I have been experimenting with this workflow across a variety of computational domains and have achieved consistent results with both Opus and GPT. My coworkers have independently used Opus for optimization suggestions on services in prod and they've led to much better performance (3x in some cases).

A more empirical test would be good for everyone (i.e. on equal hardware, give each agent the goal to implement an algorithm and make it as fast as possible, then quantify relative speed improvements that pass all test cases).

squibonpig · 2026-04-23T20:12:30 1776975150

Yeah but like what if they're sorta embellishing it or just lying? That's the issue with not being reproducible.

theptip · 2026-04-24T14:30:57 1777041057

The tension here is that what customers need to reproduce is this result on their own problem. To measure this you need extensive evals on private data.

OpenAI simply won’t share the data you need to reproduce this in the way you’d hope for an academic paper.

It’s an engineering result, not a scientific one.

jstanley · 2026-04-23T18:58:48 1776970728

Oh, come on, if they do well on benchmarks people question how applicable they are in reality. If they do well in reality people complain that it's not a reproducible benchmark...

girvo · 2026-04-23T21:53:23 1776981203

That's easily explained by those being two different people with two different opinions?

2goomba1stage · 2026-04-24T02:01:55 1776996115

And together they make one single community that s effectively NEVER happy.

minimaxir · 2026-04-22T05:11:40 1776834700

OP makes that concession in the first section of the post. (I may or may not have made a similar comment before deleting in kneejerk shame)