pgvector vs FAISS
Problem
update: Upgrading to v0.1.1 and building with `PG_CFLAGS=-ffast-math make` reduced the query time to 2.2s! Big speed jump, but 1.7x slower than the FAISS / Python service. ----- I imported 792010 rows of 512d image vectors (~5GB) (aka not random) and ran a tests[0] to find the 4 closests vectors to an exact vector in the dataset. Searching with: - 1.279357709s - FAISS python web service (using json and IndexFlatL2) (with 791963 vectors [2]). - 11.381s - Searching (l2_distance) with pgvector extension (with 792010 rows) . Hardware: [code block] Importing took 11.381 seconds with the `COPY` cmd from a csv file with each row being the vector. Any ideas why pgvector would be so much slower? The testing ENVs between the tools was significantly different, to the FAISS's dis-advantage, but FAISS was still much quicker. [1] Not a "scientific" test. I had other programs running on the machine when running this test. Mileage may vary. [2] The slight difference is the fais's vector import filters duplicate vectors.
Unverified for your environment
Select your OS to check compatibility.
1 Fix
Optimize pgvector Configuration for Improved Query Performance
The pgvector extension may not be optimized for high-dimensional vector searches compared to FAISS, which is specifically designed for efficient nearest neighbor searches. Factors such as indexing method, query execution plan, and database configuration can significantly affect performance. Additionally, the query time can be impacted by the lack of appropriate indexing strategies in PostgreSQL.
Awaiting Verification
Be the first to verify this fix
- 1
Create an Index on the Vector Column
Creating an index on the vector column can drastically improve search performance by allowing PostgreSQL to quickly locate the nearest vectors. Use the GIST index for better performance with high-dimensional data.
sqlCREATE INDEX ON your_table USING GIST (vector_column); - 2
Adjust PostgreSQL Configuration Parameters
Tune PostgreSQL configuration parameters such as work_mem and maintenance_work_mem to allocate more memory for query operations, which can help speed up vector searches.
sqlSET work_mem = '256MB'; SET maintenance_work_mem = '512MB'; - 3
Use the Correct Distance Metric
Ensure that the distance metric used in pgvector matches the one used in FAISS. If FAISS uses L2 distance, ensure that pgvector is configured to use the same metric for accurate comparisons.
sqlSELECT * FROM your_table ORDER BY vector_column <-> your_query_vector LIMIT 4; - 4
Batch Queries for Performance Improvement
If applicable, consider batching your queries to reduce overhead. Instead of querying for each vector individually, retrieve multiple vectors at once to minimize the number of database calls.
sqlSELECT * FROM your_table WHERE vector_column <-> your_query_vector LIMIT 4; - 5
Profile Query Performance
Use PostgreSQL's EXPLAIN ANALYZE to profile your queries and identify bottlenecks. This will help you understand where the performance issues lie and how to address them effectively.
sqlEXPLAIN ANALYZE SELECT * FROM your_table ORDER BY vector_column <-> your_query_vector LIMIT 4;
Validation
After implementing the above steps, re-run your vector search queries and compare the execution time against the previous benchmarks. The goal is to achieve a query time closer to that of FAISS. Use EXPLAIN ANALYZE to confirm that the query plan is optimized.
Sign in to verify this fix
Environment
Submitted by
Alex Chen
2450 rep