FG
💻 Software🤖 AI & LLMs

Inserting into a pgvector table with an HNSW index is very slow, And update more slow

Fresh5 days ago
Mar 14, 20260 views
Confidence Score55%
55%

Problem

We conducted a performance test on pgvector and were very impressed with its search performance, which is outstanding. We are very fond of this product, but currently, we are facing challenges with slow insert/update speeds. I tested the insert performance on a table with 1024-dimensional vectors, and found that a single client can only insert about 20 rows per second. The insertion rate decreases over time, and when the dataset reaches millions of rows, the insertion speed drops to around 3 rows per second. This QPS is extremely low for our business use case, where we need to insert 300,000 rows in a short period for scheduled tasks. Even with 10 clients inserting in parallel, the speed only improves by about 8 times, which is still too slow. Dropping the index is not a viable solution either, as rebuilding the index consumes significant /dev/shm and CPU resources.

Unverified for your environment

Select your OS to check compatibility.

1 Fix

Canonical Fix
Unverified Fix
New Fix – Awaiting Verification

Optimize pgvector Insert/Update Performance with Batch Processing and Configuration Tuning

Medium Risk

The slow insert and update speeds in pgvector tables with HNSW indexes are primarily due to the overhead of maintaining the index during frequent write operations. Each insert/update operation requires the index to be updated, which can lead to significant performance degradation, especially with high-dimensional vectors. Additionally, PostgreSQL's default configuration may not be optimized for bulk inserts, leading to further bottlenecks.

Awaiting Verification

Be the first to verify this fix

  1. 1

    Batch Insert Operations

    Instead of inserting rows one at a time, group multiple rows into a single insert statement. This reduces the overhead of transaction management and index updates.

    sql
    INSERT INTO your_table (vector_column) VALUES (vector1), (vector2), (vector3), ...;
  2. 2

    Disable Index During Bulk Inserts

    Temporarily disable the HNSW index during bulk insert operations. This can be done by dropping the index before the insert and recreating it afterward. This avoids the overhead of maintaining the index during each insert.

    sql
    DROP INDEX IF EXISTS your_index; -- Perform bulk inserts here -- CREATE INDEX your_index ON your_table USING hnsw (vector_column);
  3. 3

    Adjust PostgreSQL Configuration

    Increase the 'maintenance_work_mem' and 'work_mem' settings in PostgreSQL to allow for more memory during index creation and query execution. This can help speed up the insert and update processes.

    sql
    SET maintenance_work_mem = '1GB'; SET work_mem = '64MB';
  4. 4

    Use Unlogged Tables for Temporary Data

    If applicable, consider using unlogged tables for temporary data storage during the bulk insert process. Unlogged tables do not write to the WAL (Write-Ahead Log), which can significantly improve insert performance.

    sql
    CREATE UNLOGGED TABLE temp_table (vector_column vector(1024));
  5. 5

    Monitor and Optimize Autovacuum Settings

    Ensure that autovacuum settings are optimized to prevent table bloat, which can slow down insert and update operations. Adjust the autovacuum parameters to run more frequently on heavily updated tables.

    sql
    ALTER TABLE your_table SET (autovacuum_vacuum_scale_factor = 0.1, autovacuum_vacuum_threshold = 1000);

Validation

After implementing the above steps, monitor the insert/update rates using EXPLAIN ANALYZE to confirm that the performance has improved. Aim for an insertion rate closer to the required 300,000 rows in the desired time frame. Additionally, check the system resource usage to ensure that the changes have not adversely affected other operations.

Sign in to verify this fix

Environment

Submitted by

AC

Alex Chen

2450 rep

Tags

pgvectorembeddingsvector-search