FG
💻 Software🤖 AI & LLMs

Parallel index builds for HNSW

Freshabout 2 years ago
Mar 14, 20260 views
Confidence Score77%
77%

Problem

Hi all, support for in-memory, parallel index builds is now available in the hnsw-fast-build branch :tada: A few benchmarks from my local machine with the SIFT 1M dataset (128 dimensions): code version | processes | build time --- | --- | --- 0.5.1 | 1 | 415 sec master | 1 | 309 sec branch | 2 | 184 sec branch | 4 | 107 sec branch | 8 | 83 sec A few useful settings are: [code block] For a high number of workers, you may also need to increase `max_parallel_workers` (default is 8). Please test it out (in a non-production environment) and share any feedback. Aiming for a release (0.5.2) at the end of January if all goes well.

Unverified for your environment

Select your OS to check compatibility.

1 Fix

Canonical Fix
High Confidence Fix
76% confidence100% success rate4 verificationsLast verified Mar 14, 2026

Solution: Parallel index builds for HNSW

Low Risk

@ankane Awesome! I'm running a series of tests, but I waned to share a very early result. Here is my test info: Dataset: 10MM 1,536-dim randomly generated normalized vectors Instance: r7gd.16xlarge (64 vCPU, 512GB RAM) Storage: NVMe Build parameters: - `m`: 16 - `ef_construction`: 100 PostgreSQL configuration of relevance: - `shared_buffers`: 128GB - `maintenance_work_mem`: 128GB - `max_parallel

76

Trust Score

4 verifications

100% success
  1. 1

    I'm running a series of tests, but I waned to share a very early result. Here is

    I'm running a series of tests, but I waned to share a very early result. Here is my test info:

  2. 2

    Dataset: 10MM 1,536-dim randomly generated normalized vectors

    Instance: r7gd.16xlarge (64 vCPU, 512GB RAM) Storage: NVMe Build parameters: - `m`: 16 - `ef_construction`: 100 PostgreSQL configuration of relevance: - `shared_buffers`: 128GB - `maintenance_work_mem`: 128GB - `max_parallel_maintenance_workers`: 63 (with leader, so this will be 64) - `max_wal_size`: 20GB - `wal_compression`: zstd

  3. 3

    [hnsw-fast-bulid-branch][1] completed in 25m23s (1523227.801 ms)

    - When I checked in on `master`, it was about 16% completed. However, when looking `pg_stat_progress_create_index`, [hnsw-fast-bulid-branch][1] was outpacing `master` by about 10x. - [hnsw-fast-bulid-branch][1] was indexing at about 6,565 tps, which was more than 6x faster than the [concurrent insert method][2] on a similar data set...and the other data set had `ef_construction` at `64`!

  4. 4

    This looks really promising! There are a few more tests I plan to run:

    This looks really promising! There are a few more tests I plan to run:

Validation

Resolved in pgvector/pgvector GitHub issue #409. Community reactions: 7 upvotes.

Verification Summary

Worked: 4
Partial: 1
Last verified Mar 14, 2026

Sign in to verify this fix

Environment

Submitted by

AC

Alex Chen

2450 rep

Tags

pgvectorembeddingsvector-search