- Creating, storing, and updating an embedding of a schema that people query against
- Creating an embedding of the user's input
- Running a cosine similarity against the user input embedding and each column in a schema, then sorting by relevancy (it's a score from 1.0 to -1.0)
- Using the top n most "relevant" columns instead of passing the full schema
So far, there's some pros and cons. On the pros side, it's really fast and lets us generally be more accurate for schemas that are very large, since those can get truncated today. We've seen in some cases it can also help reduce LLM hallucinations. On the cons side, it's another layer of probabilistic behavior and still has the chance of "missing" a relevant column. We can't really say for sure if it's better overall in our test environment, so we're going to just test in production and flag it out if it's yielding worse results.
What we're looking at doing is:
- Creating, storing, and updating an embedding of a schema that people query against
- Creating an embedding of the user's input
- Running a cosine similarity against the user input embedding and each column in a schema, then sorting by relevancy (it's a score from 1.0 to -1.0)
- Using the top n most "relevant" columns instead of passing the full schema
So far, there's some pros and cons. On the pros side, it's really fast and lets us generally be more accurate for schemas that are very large, since those can get truncated today. We've seen in some cases it can also help reduce LLM hallucinations. On the cons side, it's another layer of probabilistic behavior and still has the chance of "missing" a relevant column. We can't really say for sure if it's better overall in our test environment, so we're going to just test in production and flag it out if it's yielding worse results.