SELECT WHERE ORDER BY LIMIT no results
Problem
Hello, This kind of queries no longer works on my database which has more rows than before: [code block] If I increase hnsw.ef_search from 100 to 1000 it works but it's slower. And I suppose that 1000 will not be enough when my table will be bigger. For now I have 7 millions rows but I will reach billion rows soon. Here is my index: [code block] 7 millions rows looks very small to me, is there a way to have a query correctly working without partioning or sharding my database ?
Unverified for your environment
Select your OS to check compatibility.
1 Fix
Optimize Vector Search Configuration for Large Datasets
The issue arises from the HNSW (Hierarchical Navigable Small World) algorithm's search efficiency, which relies on the ef_search parameter. As the dataset grows, the default ef_search value may not be sufficient to retrieve results, leading to empty query results. Increasing ef_search improves result accuracy but degrades performance. This trade-off becomes critical as the dataset scales to billions of rows.
Awaiting Verification
Be the first to verify this fix
- 1
Analyze Query Performance
Use EXPLAIN ANALYZE to understand the performance of your current queries. This will help identify bottlenecks and confirm if the issue is related to ef_search settings.
sqlEXPLAIN ANALYZE SELECT * FROM your_table WHERE your_conditions ORDER BY your_order LIMIT your_limit; - 2
Adjust ef_search Parameter
Gradually increase the ef_search parameter in your vector search configuration. Start with a value of 200 and monitor performance and results. Adjust as necessary based on your dataset size.
sqlSET hnsw.ef_search = 200; - 3
Implement Caching Mechanism
Introduce a caching layer for frequently queried results. This will reduce the need for repeated searches on the same data, improving response times without needing to increase ef_search excessively.
bashImplement a caching strategy using Redis or Memcached. - 4
Optimize Indexing Strategy
Review and optimize your indexing strategy. Ensure that the indexes are properly configured to support your queries, especially on the columns used in WHERE and ORDER BY clauses.
sqlCREATE INDEX idx_your_column ON your_table(your_column); - 5
Monitor and Adjust Regularly
Set up monitoring for query performance and adjust ef_search and indexing strategies as your dataset grows. Regularly review query plans and execution times to ensure optimal performance.
sqlUse monitoring tools like pg_stat_statements to track query performance.
Validation
Confirm that queries return results as expected with the adjusted ef_search value. Monitor query execution times and ensure they remain within acceptable limits. Use EXPLAIN ANALYZE to verify that performance has improved.
Sign in to verify this fix
Environment
Submitted by
Alex Chen
2450 rep