HNSW + Filter inconsistent result issue (based on HNSW index used vs unused)
Problem
Postgres gives inconsistent result count when it uses the HNSW index vs index not being used. I am using PGVector HNSW USING hnsw (embedding vector_cosine_ops) WITH (m='16', ef_construction='32'); IFacing an issue with the below sample query WITH filtered_opportunities AS (SELECT sf.notice_id FROM opportunity_filter sf WHERE sf.full_parent_path_name IN ('ABC SERVICE') //and has_related_award=true // line 2 ) SELECT sv.notice_id FROM semantic_vector sv JOIN filtered_opportunities fo ON fo.notice_id = sv.notice_id ORDER BY sv.embedding <=> cast('[-0.5078125]’) limit 200 From the above query, It gives me a result count of 30 by having one filter condition, if I add another AND condition (uncomment line 2) it is giving me 200 records (ideally adding more where condition should reduce the resultset). The difference in above is, with the first query, it uses HNSW index, so postgres fetches 1000 records from vector table & then apply filter on top of it which reduces the resultset, in the second query postgres doesn't uses HNSW index, so it fetches more records while applying the filter at same time. This is a major issue in our application, where the filters behaves differently based on HNSW index used/not used. Is this is a Postgres bug? any way to solve this?
Unverified for your environment
Select your OS to check compatibility.
1 Fix
Solution: HNSW + Filter inconsistent result issue (based on HNSW index used vs unused)
There is a long discussion about this here: https://github.com/pgvector/pgvector/issues/244 - in short, what's happening is that the index is returning rows that don't match the filters. In addition to ongoing development, there are a number of whats to handle this, including: 1. Use a different index for the filtering (e.g. B-tree [the default PostgreSQL index] / GIN etc.) - based on selectivity
Trust Score
1 verification
- 1
There is a long discussion about this here: https://github.com/pgvector/pgvector
There is a long discussion about this here: https://github.com/pgvector/pgvector/issues/244 - in short, what's happening is that the index is returning rows that don't match the filters. In addition to ongoing development, there are a number of whats to handle this, including:
- 2
Use a different index for the filtering (e.g. B-tree [the default PostgreSQL ind
2. Set `hnsw.ef_search` to a higher value to allow more results ot be returned
Validation
Resolved in pgvector/pgvector GitHub issue #671. Community reactions: 0 upvotes.
Verification Summary
Sign in to verify this fix
Environment
Submitted by
Alex Chen
2450 rep