Inserting into a pgvector table with an HNSW index is very slow, And update more slow
Problem
We conducted a performance test on pgvector and were very impressed with its search performance, which is outstanding. We are very fond of this product, but currently, we are facing challenges with slow insert/update speeds. I tested the insert performance on a table with 1024-dimensional vectors, and found that a single client can only insert about 20 rows per second. The insertion rate decreases over time, and when the dataset reaches millions of rows, the insertion speed drops to around 3 rows per second. This QPS is extremely low for our business use case, where we need to insert 300,000 rows in a short period for scheduled tasks. Even with 10 clients inserting in parallel, the speed only improves by about 8 times, which is still too slow. Dropping the index is not a viable solution either, as rebuilding the index consumes significant /dev/shm and CPU resources.
Unverified for your environment
Select your OS to check compatibility.
1 Fix
Optimize pgvector Insert/Update Performance with Batch Processing and Configuration Tuning
The slow insert and update speeds in pgvector tables with HNSW indexes are primarily due to the overhead of maintaining the index during frequent write operations. Each insert/update operation requires the index to be updated, which can lead to significant performance degradation, especially with high-dimensional vectors. Additionally, PostgreSQL's default configuration may not be optimized for bulk inserts, leading to further bottlenecks.
Awaiting Verification
Be the first to verify this fix
- 1
Batch Insert Operations
Instead of inserting rows one at a time, group multiple rows into a single insert statement. This reduces the overhead of transaction management and index updates.
sqlINSERT INTO your_table (vector_column) VALUES (vector1), (vector2), (vector3), ...; - 2
Disable Index During Bulk Inserts
Temporarily disable the HNSW index during bulk insert operations. This can be done by dropping the index before the insert and recreating it afterward. This avoids the overhead of maintaining the index during each insert.
sqlDROP INDEX IF EXISTS your_index; -- Perform bulk inserts here -- CREATE INDEX your_index ON your_table USING hnsw (vector_column); - 3
Adjust PostgreSQL Configuration
Increase the 'maintenance_work_mem' and 'work_mem' settings in PostgreSQL to allow for more memory during index creation and query execution. This can help speed up the insert and update processes.
sqlSET maintenance_work_mem = '1GB'; SET work_mem = '64MB'; - 4
Use Unlogged Tables for Temporary Data
If applicable, consider using unlogged tables for temporary data storage during the bulk insert process. Unlogged tables do not write to the WAL (Write-Ahead Log), which can significantly improve insert performance.
sqlCREATE UNLOGGED TABLE temp_table (vector_column vector(1024)); - 5
Monitor and Optimize Autovacuum Settings
Ensure that autovacuum settings are optimized to prevent table bloat, which can slow down insert and update operations. Adjust the autovacuum parameters to run more frequently on heavily updated tables.
sqlALTER TABLE your_table SET (autovacuum_vacuum_scale_factor = 0.1, autovacuum_vacuum_threshold = 1000);
Validation
After implementing the above steps, monitor the insert/update rates using EXPLAIN ANALYZE to confirm that the performance has improved. Aim for an insertion rate closer to the required 300,000 rows in the desired time frame. Additionally, check the system resource usage to ensure that the changes have not adversely affected other operations.
Sign in to verify this fix
Environment
Submitted by
Alex Chen
2450 rep