In doing some DevOps-y type tasks recently (ansible, packer, docker, baking imag...

MoonGhost · 2025-04-29T20:51:16 1745959876

You can't get more info from LLMs than it actually holds. Like Anthropic pointed if LLMs knows the name but has no other info it starts hallucinating. The same probably happens here. LLM knows there must be a flag but can't remember all of them. Likely short reminder in prompt will help. (or search web for GPT) Just my $0.02.

mikepurvis · 2025-04-29T21:23:27 1745961807

It certainly feels like you can just by challenging it; then it happily finds other paths to what you want. So maybe internally it needs a second voice encouraging it to think harder about alternatives upfront.

buu700 · 2025-04-30T07:09:52 1745996992

The fact that you can more info from an LLM than it holds is actually a pithy description of this whole challenge.

0x20cowboy · 2025-04-29T20:27:29 1745958449

I did a stint in Devops and I found every models to be like this for all of the infra-as-code languages. Anything yaml based was especially bad.

Even Amazon’s own offering completely made things up about Amazon’s own formats.

I’d be curious as to why that is. It seems like there would be enough training data, and for Amazon in particular it seems like they could make a validation tool the model could use.

mikepurvis · 2025-04-29T21:21:25 1745961685

Maybe I'm excessively anthropomorphizing, but it does feel a bit analogous to my own thought process, like "I need feature XYZ, and based on other tools I'm more familiar with it should be an --xyz flag, so let me google for that and see if I'm right or if I instead find a four-year-old wontfix on Github where someone asked for what I need and got denied."

Except... the model is missing that final step; instead it just belches out its hypothesis, all dressed up in chirpy, confident-sounding language, certain that I'm moments away from having everything working just perfectly.

meander_water · 2025-04-29T23:31:30 1745969490

Cursor has a neat feature where you can upload custom docs, and then reference them with @Docs. I find this prevents hallucinations, and also using a reasoning model

organsnyder · 2025-04-29T20:16:13 1745957773

I've enjoyed watching Claude try running commands with incorrect flags, trying them, and then adapting.

corvus-cornix · 2025-04-30T16:08:25 1746029305

I've also found LLMs to perform poorly at DevOps tasks. Perhaps there's a lack of training data. On the bright side this hints at better job security for platform engineers.

vunderba · 2025-04-29T20:20:36 1745958036

100%. This has happened enough to me that I wished I could just inject the man page docs into it to at least act as a sanity check.

nonelog · 2025-04-29T20:17:16 1745957836

Spot on.