Parallel index builds for HNSW

Question

Accepted Answer

@ankane Awesome!

I'm running a series of tests, but I waned to share a very early result. Here is my test info:

Dataset: 10MM 1,536-dim randomly generated normalized vectors
Instance: r7gd.16xlarge (64 vCPU, 512GB RAM)
Storage: NVMe
Build parameters:
- `m`: 16
- `ef_construction`: 100
PostgreSQL configuration of relevance:
- `shared_buffers`: 128GB
- `maintenance_work_mem`: 128GB
- `max_parallel I'm running a series of tests, but I waned to share a very early result. Here is my test info: Instance: r7gd.16xlarge (64 vCPU, 512GB RAM)
Storage: NVMe
Build parameters:
- `m`: 16
- `ef_construction`: 100
PostgreSQL configuration of relevance:
- `shared_buffers`: 128GB
- `maintenance_work_mem`: 128GB
- `max_parallel_maintenance_workers`: 63 (with leader, so this will be 64)
- `max_wal_size`: 20GB
- `wal_compression`: zstd - When I checked in on `master`, it was about 16% completed. However, when looking `pg_stat_progress_create_index`, [hnsw-fast-bulid-branch][1] was outpacing `master` by about 10x.
- [hnsw-fast-bulid-branch][1] was indexing at about 6,565 tps, which was more than 6x faster than the [concurrent insert method][2] on a similar data set...and the other data set had `ef_construction` at `64`! This looks really promising! There are a few more tests I plan to run:

Parallel index builds for HNSW

Problem

1 Fix

Solution: Parallel index builds for HNSW

I'm running a series of tests, but I waned to share a very early result. Here is

Dataset: 10MM 1,536-dim randomly generated normalized vectors

[hnsw-fast-bulid-branch][1] completed in 25m23s (1523227.801 ms)

This looks really promising! There are a few more tests I plan to run:

Validation

Verification Summary

Environment

Submitted by

Tags