The problem here isn't that an LLM hallucinates. The problem is that nobody asked for an AI response, and Meta pushed content to a forum that makes such claims, which could easily mislead or at least confuse people not sophisticated enough to be on the lookout for hallucinations.
Meta should be (and is) in the business of policing third-party spam on their forums that does exactly this. We can infer what must've happened - the model must've been fine-tuned on forum comments, and this would be the likely format for a response to that question. This sort of thing should've been caught by a wrapper/guard model, and will probably make a good case to add to such a model's instructions/training.
(btw: is it "an LLM" or "a LLM"? I guess I should ask an LLM which it prefers to be called)
The distinction between using “a” vs “an” is one based on the immediately proceeding syllable sound rather than the letter. If it’s proceeded by a vowel, then use “an”, and if it’s proceeded by a consonant, use “a”.
Because “LLM” is pronounced “el el em”, the first syllable sound is “eh”—a vowel.
The same letter may need different “a”/“an” article based on how the word is pronounced. For example “an LLM” vs “a layer”.
As someone with a photographic memory as my main (because it's most the "performant" one I have) learning tool, I strongly feel that somebody should make a really long list of most commonly used words which should use "an", and conversely, a list of words which should use "a" of a similar length (both as two columns of text). I'm not really eager to "run" the check (mentioned under the link) in my head every time I need to choose an article, and the biggest problem, and the reason behind some of the mistakes in choosing the right article I sometimes do, is that I don't really see some the less common words which should use "an" (despite starting with a letter which suggest otherwise) this often, or actually, often enough.
And when I'm reading a text for any other purpose than memorizing article choices (so, 99,999% of cases), they don't get enough of my attention to get remembered - it's the meaning of other words which get it, and big part of which will get remembered, not the articles used before them.
Being able to look on such a list every few days for say, a month, would definitely help to remember most of these cases.
"Almost" can make a big difference when writing stuff where formal style is expected. For instant messaging, yeah, probably one doesn't have to care.
When writing, you put articles before a letter, but they're based on what phoneme they precede. Therefore, when purely classifying letters, there's much more combinations than 26. In hundreds or lower thousands, possibly. Or more.
That's the difference, and for me it's frankly to memorize a big look-up table of most commonly used words (and which articles should precede them), because this doesn't require any effort to me, than to run an "algorithm" translating to phonemes every time I write something. In a quick reading (not reading out aloud, or even mentally mimicking reading something out aloud), wrong article being used won't necessarily get easily picked up. It's the sheer laziness, I guess.
Such a table can't exist. Pronunciation is varies, so your choice of articles adds character to your text in much the same way your accent does for spoken words.
Take "herb," for example. In some dialects, the "h" is vocalized, while in others, it's silent. Both "an herb" and "a herb" are valid. Your choice in your writing conveys identity. An author who opts for "a herb" helps paint a vague picture of the individual behind the words, perhaps someone from England.
You could make your own personal table, but it would be for you and only you.
Also, although there is a concrete rule, it's not something we're thinking about as we talk--using the wrong article just feels wrong. Most of us aren't consciously "running an algorithm," as you put it; the correct article just comes out.
Most people will find that they develop the same skill with writing over time. The subset of people who have trouble developing that skill and learn best by memorizing a table of words is going to be quite small. I would never write "a LLM" in the same way that I would never say "an history" out loud.
That's a really good comment! Thanks. By the way, pronunciation of articles also varies. With some variants "a" would get much more universal (but also making it harder to notice when it's used wrong), while with others it would be almost impossible to use "a" where "an" should be used without exposing yourself to a major tongue-twister.
And yeah, I'm aware that the subset of people mentioned is quite small. On this subject, it's that memorization requires close to no effort for me, and is close to instantaneous and long-lasting (as long as I run through it several times and the things learned aren't ending up being completely unused), while developing the intuitive feel of the right article, as you rightly put it, takes time (however, can also be close to effortless to some)... and lots of writing.
Meta should be (and is) in the business of policing third-party spam on their forums that does exactly this. We can infer what must've happened - the model must've been fine-tuned on forum comments, and this would be the likely format for a response to that question. This sort of thing should've been caught by a wrapper/guard model, and will probably make a good case to add to such a model's instructions/training.
(btw: is it "an LLM" or "a LLM"? I guess I should ask an LLM which it prefers to be called)