Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
emp17344
21 days ago
|
parent
|
context
|
favorite
| on:
Measuring AI Ability to Complete Long Tasks
I wonder how much of this stuff is attributable to true model advancement, or if it’s an improvement in the genetic harness? It’s impossible to separate strict model improvement from improvement in the associated tools.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: