FG
💻 Software🤖 AI & LLMs

pgvector vs FAISS

Fresh5 days ago
Mar 14, 20260 views
Confidence Score52%
52%

Problem

update: Upgrading to v0.1.1 and building with `PG_CFLAGS=-ffast-math make` reduced the query time to 2.2s! Big speed jump, but 1.7x slower than the FAISS / Python service. ----- I imported 792010 rows of 512d image vectors (~5GB) (aka not random) and ran a tests[0] to find the 4 closests vectors to an exact vector in the dataset. Searching with: - 1.279357709s - FAISS python web service (using json and IndexFlatL2) (with 791963 vectors [2]). - 11.381s - Searching (l2_distance) with pgvector extension (with 792010 rows) . Hardware: [code block] Importing took 11.381 seconds with the `COPY` cmd from a csv file with each row being the vector. Any ideas why pgvector would be so much slower? The testing ENVs between the tools was significantly different, to the FAISS's dis-advantage, but FAISS was still much quicker. [1] Not a "scientific" test. I had other programs running on the machine when running this test. Mileage may vary. [2] The slight difference is the fais's vector import filters duplicate vectors.

Unverified for your environment

Select your OS to check compatibility.

1 Fix

Canonical Fix
Unverified Fix
New Fix – Awaiting Verification

Optimize pgvector Configuration for Improved Query Performance

Medium Risk

The pgvector extension may not be optimized for high-dimensional vector searches compared to FAISS, which is specifically designed for efficient nearest neighbor searches. Factors such as indexing method, query execution plan, and database configuration can significantly affect performance. Additionally, the query time can be impacted by the lack of appropriate indexing strategies in PostgreSQL.

Awaiting Verification

Be the first to verify this fix

  1. 1

    Create an Index on the Vector Column

    Creating an index on the vector column can drastically improve search performance by allowing PostgreSQL to quickly locate the nearest vectors. Use the GIST index for better performance with high-dimensional data.

    sql
    CREATE INDEX ON your_table USING GIST (vector_column);
  2. 2

    Adjust PostgreSQL Configuration Parameters

    Tune PostgreSQL configuration parameters such as work_mem and maintenance_work_mem to allocate more memory for query operations, which can help speed up vector searches.

    sql
    SET work_mem = '256MB';
    SET maintenance_work_mem = '512MB';
  3. 3

    Use the Correct Distance Metric

    Ensure that the distance metric used in pgvector matches the one used in FAISS. If FAISS uses L2 distance, ensure that pgvector is configured to use the same metric for accurate comparisons.

    sql
    SELECT * FROM your_table ORDER BY vector_column <-> your_query_vector LIMIT 4;
  4. 4

    Batch Queries for Performance Improvement

    If applicable, consider batching your queries to reduce overhead. Instead of querying for each vector individually, retrieve multiple vectors at once to minimize the number of database calls.

    sql
    SELECT * FROM your_table WHERE vector_column <-> your_query_vector LIMIT 4;
  5. 5

    Profile Query Performance

    Use PostgreSQL's EXPLAIN ANALYZE to profile your queries and identify bottlenecks. This will help you understand where the performance issues lie and how to address them effectively.

    sql
    EXPLAIN ANALYZE SELECT * FROM your_table ORDER BY vector_column <-> your_query_vector LIMIT 4;

Validation

After implementing the above steps, re-run your vector search queries and compare the execution time against the previous benchmarks. The goal is to achieve a query time closer to that of FAISS. Use EXPLAIN ANALYZE to confirm that the query plan is optimized.

Sign in to verify this fix

Environment

Submitted by

AC

Alex Chen

2450 rep

Tags

pgvectorembeddingsvector-search