>This is a pretty simple question to answer. Take two lists and compare them.
This continues a pattern as old as home computing: The author does not understand the task themselves, consequently "holds the computer wrong", and then blames the machine.
No "lists" were being compared. The LLM does not have a "list of TLDs" in its memory that it just refers to when you ask it. If you haven't grokked this very fundamental thing about how these LLMs work, then the problem is really, distinctly, on your end.
That’s the point the author is making. The LLMs don’t have the raw correct information required to accomplish the task so all they can do is provide a plausible sounding answer. And even if it did the way they are architected still can only results in a plausible sounding answer.
They absolutely could have accomplished the task. The task was purposefully or ignorantly posed in a way that is known to be not suited to the LLM, and then the author concluded "the machine did not complete the task because it sucks."
Not really. This works great in Claude Sonnet 4.1: 'Please could you research a list of valid TLDs and a list of valid HTML5 elements, then cross reference them to produce a list of HTML5 elements which are also valid TLDs. Use search to find URLs to the lists, then use the analysis tool to write a script that downloads the lists, normalises and intersects them.'
> This works great in Claude Sonnet 4.1: 'Please could you research a list of valid TLDs and a list of valid HTML5 elements, then cross reference them to produce a list of HTML5 elements which are also valid TLDs. Use search to find URLs to the lists, then use the analysis tool to write a script that downloads the lists, normalises and intersects them.'
Ok, I only have to:
1. Generally solve the problem for the AI
2. Make a step by step plan for the AI to execute
3. Debug the script I get back and check by hand if it uses reliable sources.
Try doing all of that by hand instead. The difference is about half an hour to an hour of work plus giving your attention to such a minor menial task.
Also, you are literally describing how you are holding it wrong. If you expect the LLM to magically know what you want from it without you yourself having to make the task understandable to the machine, you are standing in front of your dishwasher waiting for it to grow arms and do your dishes in the sink.
>Hand feed them every detail for an extremely simple task like comparing two lists
You believe 57 words are "each and every detail", and that "produce two full, exhaustive lists of items out of your blackbox inner conceptspace/fetch those from the web" are "extremely simple tasks"?
Your ignorance of how complex these problems are misleads you into believing there's nothing to it. You are trying to supply an abstraction to a system that requires a concrete. You do not even realize your abstraction is an abstraction. Try learning programming.
> You believe 57 words are "each and every detail", and that "produce two full, exhaustive lists of items out of your blackbox inner conceptspace/fetch those from the web" are "extremely simple tasks"?
Sure they are. I'm not interested in how difficult this is for a LLM. This is not the question. Go out there, get the information. That this is hard for a LLM proves the point: They are surprisingly bad at some simple tasks.
>I'm not interested in how difficult this is for a LLM. This is not the question.
And neither was that my point. It is a complex problen, full stop. Again, your own inability to look past your personal abstractions ("just do the thing, it's literally one step dude") is what makes it feel simple. You ever do that "instruct someone to make coffee" exercise when you started out? What you're doing is saying "just make the coffee", refusing to decompose the problen any further, and then complaining that the other person is bad at following instructions.
How would you solve that problem? You'd probably go to the internet, get the list of TLDs and the list of HTML5-Element and than compare those lists.
The author compares three commercial large‑language models that have direct internet access, but none of them appear capable of performing this seemingly simple task. I think his conclusion is valid.
This continues a pattern as old as home computing: The author does not understand the task themselves, consequently "holds the computer wrong", and then blames the machine.
No "lists" were being compared. The LLM does not have a "list of TLDs" in its memory that it just refers to when you ask it. If you haven't grokked this very fundamental thing about how these LLMs work, then the problem is really, distinctly, on your end.