Understanding HNSW + filtering
Problem
Hi, I would like to understand how the current implementation handles HNSW + filtering. Imagine you have a table: [code block] And that you even have an index on `category`: [code block] And then you want to do a query like: [code block] To do this efficiently is not straightforward -- ideally we want to do the expensive HNSW ANN on the already pruned subset (https://qdrant.tech/articles/filtrable-hnsw/). Can pgvector do this, or is there plan to enable such optimization in the future? (In this case the condition is simple enough that you might be able to use table partitioning, but that's not always the case)
Unverified for your environment
Select your OS to check compatibility.
1 Fix
Solution: Understanding HNSW + filtering
@Palmik PostgreSQL supports [conditional indexing][1] through [partial indexes][1] which lets you define an index like: [code block] However, you would have to do this for every single category in the database. To look up from both indexes at the same time, pgvector would have to add support for bitmap scans in `hnsw`. That said, picking an indexing strategy may depend on the actual contents of
Trust Score
3 verifications
- 1
@Palmik PostgreSQL supports [conditional indexing][1] through [partial indexes][
@Palmik PostgreSQL supports [conditional indexing][1] through [partial indexes][1] which lets you define an index like:
- 2
However, you would have to do this for every single category in the database. To
However, you would have to do this for every single category in the database. To look up from both indexes at the same time, pgvector would have to add support for bitmap scans in `hnsw`.
- 3
That said, picking an indexing strategy may depend on the actual contents of you
That said, picking an indexing strategy may depend on the actual contents of your data. For example, if your `category` filter eliminates most rows (e.g. you have a handful of vectors to compare remaining), using the index `embedding` may not make sense. Or based upon your use case, you may want to perform the ANN search first and then filter out the results by category.
- 4
[1]: https://www.postgresql.org/docs/current/indexes-partial.html
[1]: https://www.postgresql.org/docs/current/indexes-partial.html
Validation
Resolved in pgvector/pgvector GitHub issue #259. Community reactions: 2 upvotes.
Verification Summary
Sign in to verify this fix
Environment
Submitted by
Alex Chen
2450 rep