FG
💻 Software🤖 AI & LLMs

IVFFLAT QPS too low

Freshover 1 year ago
Mar 14, 20260 views
Confidence Score86%
86%

Problem

I am using IVFFLAT for 1200 dimensional embeddings of vector type. I have 20 million rows. The query that is slow takes a user provided vector and finds the top 100 matching vectors. There are 4200 lists and 10 probes. The query takes 30 seconds when it's cold and 100ms if it's a repeat query. I confirmed with EXPLAIN that the index is being used. The issue is very slow IO. All of the query time is spent reading in the blocks that are buffer misses during the `Index Scan` operation. The throughput is about 6 mb/s. The hardware configuration is r5.2xlarge from AWS RDS. Here is an example `EXPLAIN (ANALYZE, BUFFERS)` result: [code block] Given that the IO should be on SSD, I am puzzled at the extremely low throughput. What am I doing wrong? Should I be using HNSW? Is it a huge performacne issue that most of my rows are stored as TOAST?

Unverified for your environment

Select your OS to check compatibility.

1 Fix

Canonical Fix
High Confidence Fix
84% confidence100% success rate3 verificationsLast verified Mar 14, 2026

Solution: IVFFLAT QPS too low

Low Risk

@bantmen A r5.2xlarge has 64GB of RAM, and if you're using the PostgreSQL defaults, this would mean 16GB is allocated for shared buffers. An IVFFlat index for this dataset would be ~89GiB in size (excluding the size of the data in the table itself), so your entire index won't fit into memory. Additionally, the data access pattern for IVFFlat can be oversimplified as "random" - using your descript

84

Trust Score

3 verifications

100% success
  1. 1

    @bantmen A r5.2xlarge has 64GB of RAM, and if you're using the PostgreSQL defaul

    @bantmen A r5.2xlarge has 64GB of RAM, and if you're using the PostgreSQL defaults, this would mean 16GB is allocated for shared buffers.

  2. 2

    An IVFFlat index for this dataset would be ~89GiB in size (excluding the size of

    An IVFFlat index for this dataset would be ~89GiB in size (excluding the size of the data in the table itself), so your entire index won't fit into memory. Additionally, the data access pattern for IVFFlat can be oversimplified as "random" - using your description above, out of 4,200 centers, you're trying to find the 10 closest to a query vector. Amongst those centroids, you're looking at ~48K vectors, which is ~220MB of data. Given each query could lead to a complete different set of 10 out of the 4,200 centers, you could end up in a situation where you're swapping data in/out of memory fai

  3. 3

    Could HNSW help? In this case, you're likely keeping the top layers of your grap

    Could HNSW help? In this case, you're likely keeping the top layers of your graph in memory, and while the lower level may involve more fetches to disk, it's likely to be lower. For example, if you're using `hnsw.ef_search` on the default setting (40), you'll likely scan far fewer vectors (without seeing empirical data around your embedding model, I can't give an estimated convergence). I can't comment on if you'll meet your recall target without more information.

Validation

Resolved in pgvector/pgvector GitHub issue #661. Community reactions: 2 upvotes.

Verification Summary

Worked: 3
Last verified Mar 14, 2026

Sign in to verify this fix

Environment

Submitted by

AC

Alex Chen

2450 rep

Tags

pgvectorembeddingsvector-search