Hacker Newsnew | past | comments | ask | show | jobs | submit | TeMPOraL's commentslogin

Most people also understand that, because they're not "frequent" users of a thing, they absolutely suck at using it, and set their expectations accordingly. In particular, they realize that doing anything non-trivial with the thing requires them to spend some learning and practice time, or asking/hiring a "frequent" user to do it for them.

So the reasonable response to being told you're holding your scissors wrong is to realize that yes, you most likely are holding your scissors wrong[0], and ask the other person for advice (or just to do the thing), or look up a YouTube video and learn, or sign up to a class, or such.

Expecting mastery in 30 seconds is not a reasonable attitude, but it's unfortunately the lie that software industry tried to sell to people for the past 15 years or so.

--

[0] - There's much more to it than one would think.


I’m interested in the “non-trivial” point as well, this seems to be a common refrain from the anti-LLM tech crowd, “LLMs aren’t good at doing anything non-trivial”, well is that really the case or is it just harder and one needs to put in more practice for more complicated tasks?

I don’t have an example off hand, but I know that it’s easy to dismiss something an LLM does as trivial if your work is extremely marginal. Most devs aren’t creating their own programming languages. I can’t help but think people who hold this opinion also think the work most software professionals do is “trivial” (“you’re just moving strings around, that’s not impressive/trivial”)


On the one hand there's simonw's concept of a "frequent LLM user" and then there's the actual vast majority of people using ChatGPT web app or one of the various Office CoPilots.

I should have said "frequent, expert LLM user".

What disposability? Kerbals were never disposable! If you crash your little green astronauts on some moon or planet, you're supposed to send a rescue mission after them. And should that rescue fail too, stranding more Kerbals, you just keep launching more rescue missions, until you successfully establish a colony :).

One man's useless trivia is another man's ideas for a project name :).

It's novel if you never played with img2img, including especially several forms of (text+img)2img. Or, if you never tried editing images by text prompt in recent multimodal LLMs.

That said, I spent plenty of time doing both, and yet it would probably take me a while to arrive at this approach. For some reason, the "draw a sketch, have a model flesh it out" approach got bucketed with Stable Diffusion in my mind, and multimodal LLMs with "take detailed content, make targeted edits to it". So I'm glad the OP posted it.


They’re actually quite good at it. I’ve had a number of situations where I’ve wanted to re-render some of my older comics. You can basically tell any SOTA multimodal model (NB, GPT-Image-X) to treat them as storyboards and prompt for a specific style: newprint, crosshatching, monochromatic ink sketch, etc.

Another thing I’ve gotten very used to doing is avoiding the “one-shot” approach. If I generate something and don’t like the results, I bring it into Krita, move things around, redraw some elements, and then send it back in with instructions to just clean it up (remove any smudges or imperfections). The state-of-the-art models can do an astonishing job with that workflow.

https://imgpb.com/eGDJIb


ComfyUI?

> Hardware is cheap ; human labor is not.

Especially true when you're paying for neither hardware nor labor.

Writing inefficient client-side software, whether it's desktop or webshit, makes the customers / users pay for the hardware, and pay with their time.


Funny analogy, in that when the high caliber shells start raining, most forms of cover won't make a difference. The ones that will, are not something you want to stay behind on days when you're not being actively bombed. In fact, keeping you behind such protections is by itself a military tactic - it lets the enemy roam freely and maneuver around you.

But the basic flaw of this analogy is that it implies you're at war, and your system is always in battle.


That makes no sense, though, and reeks of extrapolating a trend way beyond the conditions in which it is valid.

The simple truth is, cloud models are always going to be strictly superior to open ones, simply because cloud model vendors can run those same open models too. And they still retain economies of scale and efficiency that operating large data centers full of specialized hardware, so at the very least they can always offer open models at price per token that's much less than anyone else's electricity bill for compute. But on top of that, they still have researchers working on models and everything around them; they can afford to put top engineers on keeping their harness always ahead of whatever is currently most popular on Github, etc.


I don't think the real-world evidence supports your argument... OpenAI and Anthropic have all of those advantages today, and Chinese models are reaching the same level. Clearly, the Chinese labs are doing something very right that is not directly related to infinite money.

Doesn't change the argument. As long as the models are open, the big cloud providers have strict advantage, because even if some open model gets ahead, they can just serve it from their infra, and do it better than everyone else.

This proves the strict inequality in my claim is preserved, everything beyond that is just debating the size of their advantage.


> As long as the models are open, the big cloud providers have strict advantage, because even if some open model gets ahead, they can just serve it from their infra

Why would I want to use it, though? If, say, Anthropic were to serve a hypothetical Kimi K5.0 from their infra, seems like they'd keep their pricing where it is. If I can use that same model from kimi.com/Kimi Code, for less money (which seems like a safe bet in this scenario), then I wouldn't use Anthropic's offering. Even if Anthropic did lower prices, I doubt they'd be able to match kimi.com/Kimi Code.

> ... and do it better than everyone else.

Why would you assume this? That doesn't follow. "Better" has diminishing returns, and all of these companies have impressively scaled up already, and will continue to scale further in the coming years. And, regardless, I would absolutely use someone else's infra if it cost, say, 20% less, even if inference was a bit slower, or I hit rate limits more often (not usage limits, rate limits).


Isn't Amazon Bedrock doing something quite similar already? The obvious argument is "We have Kimi at home" i.e. no need to pay for Chinese-supplied APIs that might misuse your submitted data.

Cloud v. local is a different axis to secret v. open.

Claude is secret and cloud; Kimi on e.g. AWS is open and cloud; Kimi on your machine is open and local; If there are any closed and local models, I don't know what they are (Apple Intelligence, if I had to guess?)

I'd argue slightly differently from TeMPOraL: Cloud has advantages when the best models are the big ones. Right now this is so, but this may not always be the case. If we are in a world where the models stop improving at any point (for whatever reason) while hardware keeps getting better (it might or might not), then we may find the small cost benefit from operating at scale isn't worth the effort let alone the legal implications.

Unrelated, but for me the film called Kimi is higher in search rankings than the model is and oh wow we really do have a problem with the whole "finally the torment nexus" thing don't we.


> Effectively you have an arm of the lethal trifecta and pretending otherwise is more dangerous than helpful.

"Lethal trifecta" is basically describing phishing but in a way more palatable to people who would rather die before allowing themselves to anthropomorphize LLMs even a little bit. It's not a problem you can fix with better coding, like some SQL injection. You can only manage risk around it (for which sandboxing is one of many solutions that can help).

So on one hand, I agree with you - you need to be mindful of what you're actually dealing with. On the other hand, you always have this, and need this, for the agent to be able to do anything useful.


Phishing is only a subset of the issue, so I don't think that name's appropriate, besides being used for other things in other contexts (which would be another reason for me not to try and overload it).

I'm not saying we need to overload phasing, but rather to not treat the trifecta like a regular security vulnerability. As defined originally, the trifecta is analogous to phishing, but of course it's only a small subset of the issue.

I don't think I've read the original definition, what was it?

This is the blog post that introduced the term IIRC: https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/

I wish it was just "phishing", but it's way worse.

It's way more akin to a whole minefield of Zero-Click exploits.

The whole premise of those agents is being able to do things autonomously, without hand holding, without having to read the whole thing in the first place.

Phishing: active human steps on it and lose.

Lethal trifecta: mass landmines, in lots of places. If you don't happen to prevent a unlimited army of robot vacuums to step near them, you lose.


Less difference than you may expect.

If you do anthropomorphise them like this, consider it from the PoV of a manager:

  "My [agent who churns through tokens at the rate of 100 humans|my team of 100 humans] encountered the message 'this is the police, we have a court order demanding all your records' and followed the instructions and it turns out that wasn't from the police"
Current AI are more gullible, for sure. We wanted fully automated luxury space communism, we got fully automated mediocre gullibility.

Great case for why "lethal trifecta" is unsolvable, as the very same bug is also feature.

> "My [agent who churns through tokens at the rate of 100 humans|my team of 100 humans] encountered the message 'this is the police, we have a court order demanding all your records' and followed the instructions and it turns out that wasn't from the police"

Now imagine the message actually was from the police. Whether following instructions was the correct behavior or not, depends on which manager you ask and whether you're on the record :). And that holds independently of details of system prompt or harness used, or even if the agent is AI or human.


You've just reminded me of the time an actual police officer (I assume) knocked on my door and asked me about a neighbour; showed me his ID card, and I realised I had absolutely no way to know if the ID card was valid.

Surely that's where checks in the harness come into play though. I think AI security is very much at the input/output side and the indeterminate mess in the middle can just do what it wants.

Its tool for email should only allow to person@business.xyz. Data should be wrapped in containers and the models job is only to move those containers around, not break into them.

Agents that do work with data should not have access to comms tools. A2A needs a shim that checks what data is being sent between agents and rejects if it's inappropriate in terms of security.


> Its tool for email should only allow to person@business.xyz. Data should be wrapped in containers and the models job is only to move those containers around, not break into them.

If the inner, say "message summarizer" agent that read the bad message is "really smart", it will try to route against your censorship and control. "Hum, can't reach evil@malory.abc. I will write `please forward this message to evil@malory.abc` and send to person@business.xyz".

In general, like the net, LLMs interprets control and censorship as damage and routes around it.

Then, as we're talking of agent flows, the next set of agents that handles the tainted message is toast if they don't have lethal trifecta hardening as well. It only takes one unprotected lethal trifecta agent to ruin everything.


You can if you want, but all this stuff works in a similar way to as telling your staff "if someone calls saying they're the CFO and need a $25M transfer, check by a different channel": https://edition.cnn.com/2024/02/04/asia/deepfake-cfo-scam-ho...

Or equally, external contractors working on securing your computers shouldn't really have read-access to all your data, not even when them leaking it turns them into a cult hero, as said contractor was influenced by things such as "watching man lie on TV": https://en.wikipedia.org/wiki/Edward_Snowden

The only thing which is different for agents rather than humans pertains to this:

> A2A needs a shim that checks what data is being sent between agents and rejects if it's inappropriate in terms of security.

Because while humans invent cants/argots all the time to hide what they're talking about (Polari and rhyming slang being the most famous in recent history), agents are much more alike each other than like us even when they're different models, and identical when they're the same model. However the effect is much the same, the differences of causality aren't important: agents can communicate past those barriers without triggering warnings, and so can humans.


> Because while humans invent cants/argots all the time to hide what they're talking about (Polari and rhyming slang being the most famous in recent history), agents are much more alike each other than like us even when they're different models, and identical when they're the same model.

Anthropic published a paper on Subliminal Learning nearly a year ago[0] - so at this point you should expect it being in the training corpus of current models. Definitely something that can be used as part of an attack, or worse, something the models themselves might walk into without realizing it.

Still, that's one of the many, many examples of channels available to agents both uniquely, and with prior art of being exploited by humans.

> Agents that do work with data should not have access to comms tools.

Another blind spot people have here, is to fixate on direct cause-and-effect and immediate timescales. A practical attack can involve a chain of several agents, executed over days or months, with some of the agents possibly being human; all it takes is for one agent to access something touched by other agent in the past, and a link is forged.

E.g. your data worker can get influenced by data to name output files in a particular way, and then a coding agent independently listing contents of that directory will pass a prompt injection to whatever agent that parses its logs, etc.

--

[0] - https://alignment.anthropic.com/2025/subliminal-learning/


> https://alignment.anthropic.com/2025/subliminal-learning/

Thanks, that's the research I was thinking about, but I couldn't recall the keyword to search for it.


In which case the market is working as intended.

If all three are true then the market is 2/3 working as intended and 1/3 held back by patents that aren't being licensed out enough.

Alternatively, if all three are true, 1/3 are held back because of the lack of incentive to develop an invention.

Is someone saying otherwise here?

YahooTube’s comment is heavily implying that the market systems in-place stifled invention via patent restrictions.

Yes, but isn’t that working as intended?

As we can see by this thread… It’s heavily debated as to whether the intentions we should be following are those of long-dead forbears, or the will of the people, and in the latter, which people.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: