I wonder how much of this stuff is attributable to true model advancement, or if...

		emp17344 21 days ago \| parent \| context \| favorite \| on: Measuring AI Ability to Complete Long Tasks I wonder how much of this stuff is attributable to true model advancement, or if it’s an improvement in the genetic harness? It’s impossible to separate strict model improvement from improvement in the associated tools.