Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Integrations are nice, but the superpower is having an AI smart enough to operate a computer/keyboard/mouse so it can do anything without the cooperation/consent of the service being used.

Lots of people are making moves in this space (including Anthropic), but nothing has broken through to the mainstream.



I get often ratelimited or blocked from websites because I browse them too fast with my keyboard and mouse. The AI would be slowed down significantly.

LLM-desktop interfaces make great demos, but they are too slow to be usable in practice.


Good point. Probably makes sense to think of it as an assistant you assign a job to and get results back later.


Or even access multiple files?

Why can't one set up a prompt, test it against a file, then once it is working, apply it to each file in a folder in a batch process which then provides the output as a single collective file?


I've just done something similar with Claude Desktop and its built-in MCP servers.

The limits are still buggy responses - Claude often gets stuck in a useless loop if you overfeed it with files - and lack of consistency. Sometimes hand-holding is needed to get the result you want. And it's slow.

But when it works it's amazing. If the issues and limitations were solved, this would be a complete game changer.

We're starting to get somewhat self-generating automation and complex agenting, with access to all of the world's public APIs and search resources, controlled by natural language.

I can't see the edges of what could be possible with this. It's limited and clunky for now, but the potential is astonishing - at least as radical an invention as the web was.


I would be fine with storing the output from one run, spooling up a new one, then concatenating after multiple successive runs.


You can probably achieve what you want with https://github.com/simonw/llm and a little bit of command line.

Not sure what OS you're on, but in Windows it might look like this:

FOR %%F IN (*.txt) DO (TYPE "%%F" | llm -s "execute this prompt" >> "output.txt)


I want to work with PDFs (or JPEGs), but that should be a start, I hope.


llm supports attachments too

FOR %%F IN (*.pdf) DO (llm -a %%F -s "execute this prompt" >> output.txt)


I've been using Claude Desktop with built-in File MCP to run operations on local files. Sometimes it will do things directly but usually it will write a Python script. For example: combine multiple.md files into one or organize photos into folders.

I also use this method for doing code prototyping by giving it the path to files in the local working copy of my repo. Really cool to see it make changes in a vite project and it just hot reloads. Then I make tweaks or commit changes as usual.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: