Hacker Newsnew | past | comments | ask | show | jobs | submit | cezarvil's commentslogin

Hi everyone, I'm one of the creators of ACP.

We originally built this out of pure frustration. While working on our own product (Emitta), we realized that having an LLM 'look' at a screen via vision and guess where to click was ridiculously slow, unreliable, and expensive.

We looked at MCP, but that's strictly data/tools. We looked at AG-UI and A2UI, but they require building net-new components. We just wanted the agent to operate the clunky, existing UI we already had. So we wrote a protocol that basically gives the agent a structured 'map' of the live DOM, and lets it send back native execution commands (like set_field, click).

The reference server is up on npm (@acprotocol/server). I’m around all day and would love to hear your thoughts on the architecture, if the action set (8 actions) makes sense to you, or if you think the native UI-control approach is the right path forward.


MCP really isn't aging well, to be honest. LLMs are just way more efficient at writing a single script that targets an API directly, rather than ping-ponging across a protocol that's inherently slow and token-heavy. Not saying MCP is bad, just that it's obviously not the silver bullet everyone thought it was.

Cloudflare letting the LLM write a single JS function to execute the whole chain in an edge isolate is super smart. It finally offloads the agent's inner loop.

I’ve been dealing with the exact same latency/reliability mess, but on the frontend. We ended up building an open protocol to let agents operate live UIs natively because vision and DOM-scraping loops are just painfully slow. Moving the actual execution engine as close to the target as possible (either an edge V8 isolate for APIs, or a native SDK for the frontend) seems to be the only real way out of the current "slow and expensive" agent phase.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: