After reading the blog post, it seems like there's two issues:
1. This type of question (return a desired emoji) requires a high-degree of "accuracy" on a single token. Contrast that with more typical LLM tasks which tend to emphasize more holistic "correctness" of multiple output tokens.
2. The (mode of the) token probability distribution converges to a "hole" in the token corpus, but the model is designed to "snap to" the token nearest the hole. So it returns the wrong emoji. Normally this isn't a problem, since token embeddings are constructed so that things near the "hole" have similar semantic meanings, so perform equivalently in most sentences. But this is where Issue 1 rears its head: exact 1-token accuracy is the performance metric for evaluation, so something "similar" to a seahorse emoji is as bad as something totally unrelated.
These two core issues are particularly problematic as production models are fine-tuned to be "self-reflective", so the model reasoning chain then causes it to keep retrying the task, even though the problem is ultimately an issue with the tokenizer/token embeddings. Some models are capable of converging to the "correct" answer which is to spit out a sequence of tokens which can be read as "none exists"; this is probably heavily influenced by the prompt ("is there a seahorse emoji" vs. "show me the seahorse emoji").
I think the real way we need to reason about this is via the topology(/homology) of the underlying embedding space; seems that our current tools assume a Cauchy-complete token space. In reality some tokens simply are undefined. While intuitively that seems rare for natural spoken/written language (as an undefined token is a semantic meaning without a word, and people tend to just make up new words when they need them), in the world of "hard languages" (coding, math, pictograms/emojis) these topological holes are actually meaningful! A coding language might have a truly undefined token, even though it is semantically similar to other tokens in the corpus. Moreover the topology near these holes can be super misleading (everything is infinitely continuous up until you fall into it), so it's basically the worst corner-case for the kinds of iterative gradient descent algorithms we use to build NNs. It seems like we need a richer set of constructs for representing language tokens than Banach spaces; a super thought provoking area of work for sure!
1. This type of question (return a desired emoji) requires a high-degree of "accuracy" on a single token. Contrast that with more typical LLM tasks which tend to emphasize more holistic "correctness" of multiple output tokens.
2. The (mode of the) token probability distribution converges to a "hole" in the token corpus, but the model is designed to "snap to" the token nearest the hole. So it returns the wrong emoji. Normally this isn't a problem, since token embeddings are constructed so that things near the "hole" have similar semantic meanings, so perform equivalently in most sentences. But this is where Issue 1 rears its head: exact 1-token accuracy is the performance metric for evaluation, so something "similar" to a seahorse emoji is as bad as something totally unrelated.
These two core issues are particularly problematic as production models are fine-tuned to be "self-reflective", so the model reasoning chain then causes it to keep retrying the task, even though the problem is ultimately an issue with the tokenizer/token embeddings. Some models are capable of converging to the "correct" answer which is to spit out a sequence of tokens which can be read as "none exists"; this is probably heavily influenced by the prompt ("is there a seahorse emoji" vs. "show me the seahorse emoji").
I think the real way we need to reason about this is via the topology(/homology) of the underlying embedding space; seems that our current tools assume a Cauchy-complete token space. In reality some tokens simply are undefined. While intuitively that seems rare for natural spoken/written language (as an undefined token is a semantic meaning without a word, and people tend to just make up new words when they need them), in the world of "hard languages" (coding, math, pictograms/emojis) these topological holes are actually meaningful! A coding language might have a truly undefined token, even though it is semantically similar to other tokens in the corpus. Moreover the topology near these holes can be super misleading (everything is infinitely continuous up until you fall into it), so it's basically the worst corner-case for the kinds of iterative gradient descent algorithms we use to build NNs. It seems like we need a richer set of constructs for representing language tokens than Banach spaces; a super thought provoking area of work for sure!