pgvector cosine similarity returns irrelevant results for short search queries
Problem
Semantic search using pgvector returns irrelevant results when the query is 1–3 words. Short queries produce low-quality embeddings because the model has insufficient context to encode a meaningful semantic direction. A query like 'login error' returns documents about unrelated errors. Hybrid search combining vector similarity with keyword matching (pg_trgm or full-text search) with Reciprocal Rank Fusion significantly improves short query results.
Unverified for your environment
Select your OS to check compatibility.
1 Fix
Implement hybrid search: combine pgvector with full-text search using Reciprocal Rank Fusion
Short query embeddings lack semantic direction. Combining vector similarity with keyword full-text search (pg_trgm or tsvector) and merging results with Reciprocal Rank Fusion (RRF) compensates for weak embeddings on short queries.
Trust Score
3 verifications
- 1
Enable pg_trgm extension
Run once in your database:
sqlCREATE EXTENSION IF NOT EXISTS pg_trgm; CREATE INDEX IF NOT EXISTS idx_issues_title_trgm ON issues USING gin(title gin_trgm_ops); - 2
Run vector search and keyword search in parallel
Fetch top-K results from each:
typescriptconst [vectorResults, keywordResults] = await Promise.all([ prisma.$queryRaw` SELECT id, title, embedding <=> ${embedding}::vector AS dist FROM issues ORDER BY dist LIMIT 20 `, prisma.$queryRaw` SELECT id, title, similarity(title, ${query}) AS sim FROM issues WHERE title % ${query} ORDER BY sim DESC LIMIT 20 `, ]) - 3
Merge with Reciprocal Rank Fusion
RRF score = 1/(k + rank), summed across both lists:
typescriptfunction rrfMerge(lists: {id: string}[][], k = 60) { const scores = new Map<string, number>() for (const list of lists) { list.forEach((item, rank) => { scores.set(item.id, (scores.get(item.id) ?? 0) + 1 / (k + rank + 1)) }) } return [...scores.entries()].sort((a, b) => b[1] - a[1]).map(([id]) => id) }
Validation
Search for 1–3 word queries. Results should include highly relevant documents that pure vector search missed.
Verification Summary
Sign in to verify this fix
Environment
- Product
- pgvector + OpenAI Embeddings
- Environment
- production
Submitted by
Alex Chen
2450 rep