HNSW testing
Problem
_Originally posted by @alanwli in https://github.com/pgvector/pgvector/issues/181#issuecomment-1693821662_ @ankane, not sure if you've run into this in your testing. When I ran it with 86c29b3bf038de50bb2aec21b6d896823ff1fbbe on an usecases, I was hitting what looks like some kind of race condition with the index. I simplified it down to the following python script - where there is a table with 1k vectors, inserts+deletes workload that always keeps the table at 1k vectors. But when doing an index scan with ef_search set to 1k, I see that the index will not have all 1k vectors - quite a number of occasions where it's missing 10%+ of what should be there, the worst I saw was ~24% missing. Note that I don't see this with ivfflat (uncomment the ivfflat line). Is this expected? [code block] Example output: [code block]
Unverified for your environment
Select your OS to check compatibility.
1 Fix
Fix Race Condition in HNSW Indexing for pgvector
The observed race condition occurs due to concurrent modifications (inserts and deletes) on the HNSW index while performing an index scan. This can lead to inconsistencies in the index state, causing some vectors to be missing during the search. HNSW relies on a multi-threaded architecture which may not handle concurrent updates properly in the current implementation, leading to stale reads during the search operation.
Awaiting Verification
Be the first to verify this fix
- 1
Implement Locking Mechanism
Introduce a locking mechanism around the index modification operations to ensure that no reads occur while the index is being updated. This can be done using threading locks in Python.
pythonimport threading index_lock = threading.Lock() with index_lock: # Perform insert/delete operations here - 2
Increase ef_search Parameter
Temporarily increase the ef_search parameter during testing to see if the issue persists. This can help in diagnosing if the problem is related to insufficient search parameters.
pythonef_search = 2000 # Increase from 1000 to 2000 for testing - 3
Add Logging for Index State
Add detailed logging around index operations to capture the state of the index before and after modifications. This will help in diagnosing the race condition more effectively.
pythonimport logging logging.basicConfig(level=logging.INFO) logging.info('Index state before modification: %s', index_state) - 4
Run Concurrent Tests
Create a series of concurrent tests that simulate multiple threads performing inserts and deletes while simultaneously running index scans. This will help in reproducing the race condition reliably.
pythonfrom concurrent.futures import ThreadPoolExecutor def concurrent_modifications(): # Code for concurrent inserts and deletes with ThreadPoolExecutor(max_workers=5) as executor: executor.submit(concurrent_modifications) - 5
Review and Optimize HNSW Parameters
Review the HNSW parameters such as M and efConstruction to ensure they are optimized for your specific workload. Adjusting these parameters can improve the stability and performance of the index.
pythonM = 16 # Example parameter efConstruction = 200 # Example parameter
Validation
To confirm the fix worked, run the modified script with concurrent inserts and deletes while performing index scans. Monitor the logs for any missing vectors and ensure that the index state is consistent across multiple runs. The missing vector percentage should decrease significantly.
Sign in to verify this fix
Environment
Submitted by
Alex Chen
2450 rep