Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

In doing some DevOps-y type tasks recently (ansible, packer, docker, baking images with guestfish), I've found it very frustrating how much ChatGPT will confidently tell me to use flags on tools that don't exist, or hallicinate completely non-existent functions or behaviours. And then when I spend time trying what it suggests only to hit a wall and come back like wtf mate it breezily goes "oh yes so you're right, good job figuring that out! You're so close now! Your next step is to do X and Y," and then serves up the same detailed tutorial as before but with the flag or whatever it was that it had wrong subtly changed.

It definitely makes me feel like I'm dealing with an overenthusiastic intern who is throwing stuff over the wall without checking their work, and like maybe having a second bot sitting in front of the first one being like ARE YOUR SURE ABOUT THAT could really improve things.



You can't get more info from LLMs than it actually holds. Like Anthropic pointed if LLMs knows the name but has no other info it starts hallucinating. The same probably happens here. LLM knows there must be a flag but can't remember all of them. Likely short reminder in prompt will help. (or search web for GPT) Just my $0.02.


It certainly feels like you can just by challenging it; then it happily finds other paths to what you want. So maybe internally it needs a second voice encouraging it to think harder about alternatives upfront.


The fact that you can more info from an LLM than it holds is actually a pithy description of this whole challenge.


I did a stint in Devops and I found every models to be like this for all of the infra-as-code languages. Anything yaml based was especially bad.

Even Amazon’s own offering completely made things up about Amazon’s own formats.

I’d be curious as to why that is. It seems like there would be enough training data, and for Amazon in particular it seems like they could make a validation tool the model could use.


Maybe I'm excessively anthropomorphizing, but it does feel a bit analogous to my own thought process, like "I need feature XYZ, and based on other tools I'm more familiar with it should be an --xyz flag, so let me google for that and see if I'm right or if I instead find a four-year-old wontfix on Github where someone asked for what I need and got denied."

Except... the model is missing that final step; instead it just belches out its hypothesis, all dressed up in chirpy, confident-sounding language, certain that I'm moments away from having everything working just perfectly.


Cursor has a neat feature where you can upload custom docs, and then reference them with @Docs. I find this prevents hallucinations, and also using a reasoning model


I've enjoyed watching Claude try running commands with incorrect flags, trying them, and then adapting.


I've also found LLMs to perform poorly at DevOps tasks. Perhaps there's a lack of training data. On the bright side this hints at better job security for platform engineers.


100%. This has happened enough to me that I wished I could just inject the man page docs into it to at least act as a sanity check.


Spot on.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: