Index Build Time Does Not Improve as Expected When Changing "Workers"
Problem
Hello, I'm Sabrina, a data scientist at GSI Technology. I've been working on a pgvector benchmarking project on deep-1B dataset. The documentation states that I can improve index build time by changing the parameters max_parallel_maintenance_workers and max_parallel_workers. Unfortunately, if I increase the values of the parameters beyond 10, I do not see an improvement. We tested with 1M records of deep-1B and here are some results: | number of workers | index build time in seconds | | ----------- | ----------- | | default | 4041 | | 10 | 2043 | | 20 | 2090 | I am using the following hardware: - Intel(R) Xeon(R) Gold - 104 cores Here are the loaded extensions: | Name | Version | Schema | Description | | ----------- | ----------- | ----------- | ----------- | | plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language | | vector | 0.6.1 | public | vector data type and ivfflat and hnsw access methods | Should I expect to see better performance as I increase the value of workers? Thank you! I am happy to provide my code or any additional machine information needed.
Unverified for your environment
Select your OS to check compatibility.
1 Fix
Solution: Index Build Time Does Not Improve as Expected When Changing "Workers"
@iamsabhoho Those messages will be in the server logs. FWIW, I've done and worked with folks who have done builds of 1B indexes with pgvector (particularly with 0.6.2) with a HNSW index. I've used datasets with 128 dimensions, and the build time would be somewhere in the 2-3 day range.
Trust Score
3 verifications
- 1
@iamsabhoho Those messages will be in the server logs.
@iamsabhoho Those messages will be in the server logs.
- 2
FWIW, I've done and worked with folks who have done builds of 1B indexes with pg
FWIW, I've done and worked with folks who have done builds of 1B indexes with pgvector (particularly with 0.6.2) with a HNSW index. I've used datasets with 128 dimensions, and the build time would be somewhere in the 2-3 day range.
Validation
Resolved in pgvector/pgvector GitHub issue #500. Community reactions: 2 upvotes.
Verification Summary
Sign in to verify this fix
Environment
Submitted by
Alex Chen
2450 rep