At very high scale, there's less usage of graphs. Or there's a set of clustering on top of graphs.
Graphs can be complex to build and rebalance. Graph-like data structures with a thing, then a pointer out to another thing, aren't that cache friendly.
Add to that, people almost always want to *filter* vector search results. And this is a huge blindspot for consumers and providers. It's where the ugly performance surprises come from. Filtered HNSW isn't straightforward, and requires you to just keep traversing the graph looking for results that satisfy your filter.
HNSW came out of a benchmark regime where we just indexed some vectors and tried to only maximize recall for query latency. It doesn't take into account the filtering / indexing almost everyone wants.
Turbopuffer, for example, doesn't use graphs at all, it uses SPFresh. And they recently got 200ms latency on 100B vectors.
There is an entire section of the post about that. I believe that's more the illusion of a problem because of product design issues than a real challenge since far results that match the filter are totally useless.
Doesn't this depend on your data to a large extent? In a very dense graph "far" results (in terms of the effort spent searching) that match the filters might actually be quite similar?
The "far" here means "with vectors having a very low cosine similarity / very high distance". So in vector use cases where you want near vectors matching a given set of filters, far vectors matching a set of filters are useless. So in Redis Vector Sets you have another "EF" (effort) parameter just for filters, and you can decide in case not enough results are collected so far how much efforts you want to do. If you want to scan all the graph, that's fine, but Redis by default will do the sane thing and early stop when the vectors anyway are already far.
I'm facing the problem you describe daily. It's especially bad because it's very difficult for me to predict if the set of filters will reduce the dataset by ~1% (in which case following the original vector index is fine) or by 99.99% (in which case you just want to brute force the remaining vectors).
Tried a million different things, but haven't heard of Turbopuffer yet. Any references on how they perform with such additional filters?
Lucene and ES implement a shortcut for filters that are restrictive enough. Since it's already optimized for figuring out if something falls into your filter set, you first determine the size of that. You traverse the HNSW normally, then if you have traversed more nodes than your filter set's cardinality, you just switch to brute forcing your filter set distance comparisons. So worst case scenario is you do 2x your filter set size vector distance operations. Quite neat.
Oh that's nice! Any references on this shortcut? How do you activate that behavior? I was playing around with ES, but the only suggestion I found was to use `count` on filters before deciding (manually) which path to take.
Not really. The thing Vespa solved was taking existing ANN methods and fixing a disk/RAM tradeoff (and some other niceties). That's nowhere close to adequte when:
1. As softwaredoug mentioned, you might want to filter results, potentially with a high filtration rate.
2. ANN isn't good enough. Suppose you need bounded accuracy with meaningfully sublinear time on a high-dimensional dataset. You're hosed.
Point (1) is just a repeat of a general observation that composition of nice data structures doesn't usually give you a nice data structure, even if it technially works. Creating a thing that does what you want without costing both arms and the president's leg requires actually understanding DS&A and applying it in your solution from the ground up.
Point (2) might seem irrelevant (after all, people are "building" stuff with RAG and whatnot nowadays aren't they?), but it's crucial to a lot of applications. Imagine, e.g., that there exists one correct result in your database. The guarantees provided by SOTA ANN solutions (on high-dimensional data) have a steep compute/correctness tradeoff, giving you an abysmal chance of finding your document without searching an eye-watering fraction of your database. I usually work around that with the relaxation that the result needs to be the best one but that its information can be fuzzy (admitting solutions which merge a bunch of low-dimensional queries, corrected via some symmetry in a later step), but if you actually need good KNN results then you're kind of hosed.
Just curious what the state of the art around filtered vector search results is? I took a quick look at the SPFresh paper and didn't see it specifically address filtering.
Graphs can be complex to build and rebalance. Graph-like data structures with a thing, then a pointer out to another thing, aren't that cache friendly.
Add to that, people almost always want to *filter* vector search results. And this is a huge blindspot for consumers and providers. It's where the ugly performance surprises come from. Filtered HNSW isn't straightforward, and requires you to just keep traversing the graph looking for results that satisfy your filter.
HNSW came out of a benchmark regime where we just indexed some vectors and tried to only maximize recall for query latency. It doesn't take into account the filtering / indexing almost everyone wants.
Turbopuffer, for example, doesn't use graphs at all, it uses SPFresh. And they recently got 200ms latency on 100B vectors.
https://turbopuffer.com/docs/vector