I think most have moved past SWE-Bench Verified as a benchmark worth tracking --...

robbies · 2026-01-19T18:39:34 1768847974

What do you like to use instead? I’ve used the aider leaderboard a couple times, but it didn’t really stick with me

NitpickLawyer · 2026-01-19T19:20:10 1768850410

swe-REbench is interesting. The "RE" stands for re-testing after the models were launched. They periodically gather new issues from live repos on github, and have a slider where you can see the scores for all issues in a given interval. So if you wait ~2 months you can see how the models perform on new (to them) real-world issues.

It's still not as accurate as benchmarks on your own workflows, but it's better than the original benchmark. Or any other public benchmarks.

khimaros · 2026-01-20T03:24:53 1768879493

Terminal Bench 2.0