FG
💻 Software🤖 AI & LLMs

Contribution Ideas

Fresh5 days ago
Mar 14, 20260 views
Confidence Score58%
58%

Problem

Here are a few places that could currently use some help: 1. Explore updating cost estimates to not use an index when a large % of rows will be filtered by a `WHERE` condition (to avoid returning no results) - #263 - [x] `hnsw-filtering-cost` branch 2. Investigate why `l2_distance` (not just `vector_l2_squared_distance`) is called for index scans</strike> - gist - [x] Explanation in https://github.com/pgvector/pgvector/issues/359#issuecomment-1840786021 - [x] See if this can be addressed in the Postgres executor 3. Investigate why the index condition isn't used for `bigint` attributes (like with `integer`) - hqann-bigint branch - [x] Works with casting (thread) 4. Investigate why parallel index scans aren't used when `amcanparallel` is set - parallel-index-scan3 branch 5. Explore updating cost estimates to not use an index when the limit > `hnsw.ef_search` - [x] `index-limit` branch

Unverified for your environment

Select your OS to check compatibility.

1 Fix

Canonical Fix
Unverified Fix
New Fix – Awaiting Verification

Optimize Cost Estimates and Index Usage for Vector Searches

Medium Risk

The current implementation of cost estimates and index usage in the vector search queries is inefficient, particularly when a significant percentage of rows are filtered by a WHERE condition. This results in unnecessary index scans and suboptimal performance. Additionally, certain attributes like bigint are not leveraging index conditions effectively, and parallel index scans are not being utilized even when possible.

Awaiting Verification

Be the first to verify this fix

  1. 1

    Update Cost Estimates for WHERE Conditions

    Modify the cost estimation logic to avoid using an index when a large percentage of rows will be filtered by a WHERE condition. This can prevent returning no results and improve performance.

    sql
    UPDATE cost_estimation SET use_index = FALSE WHERE filter_percentage > threshold;
  2. 2

    Address l2_distance Calls in Index Scans

    Investigate and modify the Postgres executor to ensure that only vector_l2_squared_distance is called during index scans, thereby reducing unnecessary computation.

    c
    /* Review and modify the executor code in Postgres to optimize distance calculations */
  3. 3

    Enable Index Conditions for bigint Attributes

    Ensure that the index conditions are applied for bigint attributes similar to how they are for integer types. This may involve modifying the query planner to recognize and optimize bigint conditions.

    sql
    ALTER TABLE your_table ADD INDEX idx_bigint (your_bigint_column);
  4. 4

    Utilize Parallel Index Scans

    Investigate the conditions under which parallel index scans are not being used despite amcanparallel being set. Adjust configurations or code to allow parallel processing for index scans.

    sql
    SET enable_parallel_index_scan = ON;
  5. 5

    Limit Index Usage Based on ef_search

    Adjust the cost estimation logic to avoid using an index when the limit exceeds hnsw.ef_search, which can help in optimizing the performance of vector searches.

    sql
    UPDATE cost_estimation SET use_index = FALSE WHERE limit > hnsw.ef_search;

Validation

Run a series of vector search queries before and after implementing the changes. Measure the execution time and resource usage to confirm that the optimizations have led to improved performance. Additionally, verify that the expected results are returned without errors.

Sign in to verify this fix

Environment

Submitted by

AC

Alex Chen

2450 rep

Tags

pgvectorembeddingsvector-search