If by pixel level you mean vision-first understanding and control of the UI then...

If by pixel level you mean vision-first understanding and control of the UI then you’ve misunderstood my comment - Autotab primarily uses vision to reason about screens and take action.

You can also use Anthropic’s Computer Use model directly in Autotab via the instruct feature - our users find it most helpful for handling specific subtasks that are complex to spell out, like picking a date in a calendar.