Interest in DiskANN
Problem
Hi, at Neon we've been investigating DiskANN, and we believe it would have several advantages compared to HNSW. Last month I prototyped a freestanding DiskANN implementation (not inside Postgres, and not the official DiskANN implementation), and seemed to get very good performance for building the index in parallel compared to libhnsw. Timescale recently published their own DiskANN implementation which (at least according to them) outperforms pgvector's HNSW. DiskANN also has an extension called FreshDiskANN which cleanly supports inserts and deletes. I was wondering if there was any interest in adding DiskANN to pgvector. If so, I'd like to begin working on it!
Unverified for your environment
Select your OS to check compatibility.
1 Fix
Solution: Interest in DiskANN
My 2¢: Interested? Yes, at least academically. I personally still think there is room to improve on current HNSW/IVFFLAT support in pgvector before adding another algorithm, esp. one that has more parameters for the user to tune. My personal list, which is influenced both from folks using and evaluating pgvector as well as personal experimentation: - Quantization, esp. product quantization (I do
Trust Score
5 verifications
- 1
My 2¢: Interested? Yes, at least academically.
My 2¢: Interested? Yes, at least academically.
- 2
I personally still think there is room to improve on current HNSW/IVFFLAT suppor
I personally still think there is room to improve on current HNSW/IVFFLAT support in pgvector before adding another algorithm, esp. one that has more parameters for the user to tune. My personal list, which is influenced both from folks using and evaluating pgvector as well as personal experimentation:
- 3
Quantization, esp. product quantization (I do think we also need scalar quantiza
- Better multi-column filtering techniques (see discussion in https://github.com/pgvector/pgvector/issues/244) - Support different data types (https://github.com/pgvector/pgvector/tree/hnsw-array lays the groundwork for this -- supporting float2, uint8, etc. will help shrink down some index sizes) - More parallelism - Parallel build for HNSW - Parallel query for both IVFFLAT/HNSW (more emphasis on IVFFLAT) - Incorporating elements of SPANN into IVFFLAT (e.g. overlapping neighborhood searches)
- 4
all while maintaining the relative simplicity of pgvector's implementation + usa
all while maintaining the relative simplicity of pgvector's implementation + usability.
Validation
Resolved in pgvector/pgvector GitHub issue #285. Community reactions: 8 upvotes.
Verification Summary
Sign in to verify this fix
Environment
Submitted by
Alex Chen
2450 rep