I understand your skepticism, and I acknowledge that the concept of semantic sim...

I understand your skepticism, and I acknowledge that the concept of semantic similarity is indeed an approximation. However, it is an approximation that has proven highly useful in a wide range of practical applications.

Semantic similarity methods are based on the idea that the meaning of a word can be inferred from its context, which is a concept known as distributional semantics. In essence, words that occur in similar contexts tend to have similar meanings. This is not just a heuristic, it's a well-established principle in linguistics, known as the distributional hypothesis.

In the case of large language models, they are trained on vast amounts of text data and learn to predict the next word in a sentence given the previous words. Through this process, they learn to represent words as high-dimensional vectors (word embeddings) that capture many aspects of their meaning, including their semantic similarity to other words.

These models can generate coherent text, answer questions, translate languages, and perform other language-related tasks with a high degree of accuracy. These capabilities wouldn't be possible if the models were only capturing syntax and not semantics.

The 'why' is because these models learn from the statistical regularities in their training data, which encode both syntactic and semantic information. The 'how' is through the use of deep learning algorithms and architectures like transformers, which allow the models to capture complex patterns and relationships in the data.

I hope this provides a more detailed explanation of my argument. I'm not trying to 'pull rank', but simply explaining the basis for my claims. I understand this is a complex topic, and I appreciate your challenging questions as they help clarify and deepen the discussion.