FG
💻 Software🤖 AI & LLMs

SELECT WHERE ORDER BY LIMIT no results

Fresh5 days ago
Mar 14, 20260 views
Confidence Score55%
55%

Problem

Hello, This kind of queries no longer works on my database which has more rows than before: [code block] If I increase hnsw.ef_search from 100 to 1000 it works but it's slower. And I suppose that 1000 will not be enough when my table will be bigger. For now I have 7 millions rows but I will reach billion rows soon. Here is my index: [code block] 7 millions rows looks very small to me, is there a way to have a query correctly working without partioning or sharding my database ?

Unverified for your environment

Select your OS to check compatibility.

1 Fix

Canonical Fix
Unverified Fix
New Fix – Awaiting Verification

Optimize Vector Search Configuration for Large Datasets

Medium Risk

The issue arises from the HNSW (Hierarchical Navigable Small World) algorithm's search efficiency, which relies on the ef_search parameter. As the dataset grows, the default ef_search value may not be sufficient to retrieve results, leading to empty query results. Increasing ef_search improves result accuracy but degrades performance. This trade-off becomes critical as the dataset scales to billions of rows.

Awaiting Verification

Be the first to verify this fix

  1. 1

    Analyze Query Performance

    Use EXPLAIN ANALYZE to understand the performance of your current queries. This will help identify bottlenecks and confirm if the issue is related to ef_search settings.

    sql
    EXPLAIN ANALYZE SELECT * FROM your_table WHERE your_conditions ORDER BY your_order LIMIT your_limit;
  2. 2

    Adjust ef_search Parameter

    Gradually increase the ef_search parameter in your vector search configuration. Start with a value of 200 and monitor performance and results. Adjust as necessary based on your dataset size.

    sql
    SET hnsw.ef_search = 200;
  3. 3

    Implement Caching Mechanism

    Introduce a caching layer for frequently queried results. This will reduce the need for repeated searches on the same data, improving response times without needing to increase ef_search excessively.

    bash
    Implement a caching strategy using Redis or Memcached.
  4. 4

    Optimize Indexing Strategy

    Review and optimize your indexing strategy. Ensure that the indexes are properly configured to support your queries, especially on the columns used in WHERE and ORDER BY clauses.

    sql
    CREATE INDEX idx_your_column ON your_table(your_column);
  5. 5

    Monitor and Adjust Regularly

    Set up monitoring for query performance and adjust ef_search and indexing strategies as your dataset grows. Regularly review query plans and execution times to ensure optimal performance.

    sql
    Use monitoring tools like pg_stat_statements to track query performance.

Validation

Confirm that queries return results as expected with the adjusted ef_search value. Monitor query execution times and ensure they remain within acceptable limits. Use EXPLAIN ANALYZE to verify that performance has improved.

Sign in to verify this fix

Environment

Submitted by

AC

Alex Chen

2450 rep

Tags

pgvectorembeddingsvector-search