Contribution Ideas

Question

Contribution Ideas

Accepted Answer

The current implementation of cost estimates and index usage in the vector search queries is inefficient, particularly when a significant percentage of rows are filtered by a WHERE condition. This results in unnecessary index scans and suboptimal performance. Additionally, certain attributes like bigint are not leveraging index conditions effectively, and parallel index scans are not being utilized even when possible. Modify the cost estimation logic to avoid using an index when a large percentage of rows will be filtered by a WHERE condition. This can prevent returning no results and improve performance. Investigate and modify the Postgres executor to ensure that only vector_l2_squared_distance is called during index scans, thereby reducing unnecessary computation. Ensure that the index conditions are applied for bigint attributes similar to how they are for integer types. This may involve modifying the query planner to recognize and optimize bigint conditions. Investigate the conditions under which parallel index scans are not being used despite amcanparallel being set. Adjust configurations or code to allow parallel processing for index scans. Adjust the cost estimation logic to avoid using an index when the limit exceeds hnsw.ef_search, which can help in optimizing the performance of vector searches.

Contribution Ideas

Problem

1 Fix

Optimize Cost Estimates and Index Usage for Vector Searches

Update Cost Estimates for WHERE Conditions

Address l2_distance Calls in Index Scans

Enable Index Conditions for bigint Attributes

Utilize Parallel Index Scans

Limit Index Usage Based on ef_search

Validation

Environment

Submitted by

Tags