> SWE-bench performance is similar to normal gpt-5, so it seems the main delta w... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		tedsanders 5 months ago \| parent \| context \| favorite \| on: GPT-5-Codex > SWE-bench performance is similar to normal gpt-5, so it seems the main delta with `gpt-5-codex` is on code refactors SWE-bench is a great eval, but it's very narrow. Two models can have the same SWE-bench scores but very different user experiences. Here's a nice thread on X about the things that SWE-bench doesn't measure: https://x.com/brhydon/status/1953648884309536958

dwaltrip 5 months ago [–]

so annoying you cant read replies without an account nowadays

Tiberium 5 months ago | | [–]

Use Nitter, the main instance works but there are a lot of other instances as well.

https://nitter.net/brhydon/status/1953648884309536958

dcre 5 months ago | | [–]

Change the url from x.com to xcancel.com to see it all.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact