FG
🤖 AI & LLMsOpenAIproduction

pgvector cosine similarity returns irrelevant results for short search queries

Fresh5 months ago
Mar 14, 20260 views
Confidence Score63%
63%

Problem

Semantic search using pgvector returns irrelevant results when the query is 1–3 words. Short queries produce low-quality embeddings because the model has insufficient context to encode a meaningful semantic direction. A query like 'login error' returns documents about unrelated errors. Hybrid search combining vector similarity with keyword matching (pg_trgm or full-text search) with Reciprocal Rank Fusion significantly improves short query results.

Unverified for your environment

Select your OS to check compatibility.

1 Fix

Canonical Fix
Moderate Confidence Fix
59% confidence63% success rate3 verificationsLast verified Mar 14, 2026

Implement hybrid search: combine pgvector with full-text search using Reciprocal Rank Fusion

Low Risk

Short query embeddings lack semantic direction. Combining vector similarity with keyword full-text search (pg_trgm or tsvector) and merging results with Reciprocal Rank Fusion (RRF) compensates for weak embeddings on short queries.

59

Trust Score

3 verifications

63% success
  1. 1

    Enable pg_trgm extension

    Run once in your database:

    sql
    CREATE EXTENSION IF NOT EXISTS pg_trgm;
    CREATE INDEX IF NOT EXISTS idx_issues_title_trgm ON issues USING gin(title gin_trgm_ops);
  2. 2

    Run vector search and keyword search in parallel

    Fetch top-K results from each:

    typescript
    const [vectorResults, keywordResults] = await Promise.all([
      prisma.$queryRaw`
        SELECT id, title, embedding <=> ${embedding}::vector AS dist
        FROM issues ORDER BY dist LIMIT 20
      `,
      prisma.$queryRaw`
        SELECT id, title, similarity(title, ${query}) AS sim
        FROM issues WHERE title % ${query} ORDER BY sim DESC LIMIT 20
      `,
    ])
  3. 3

    Merge with Reciprocal Rank Fusion

    RRF score = 1/(k + rank), summed across both lists:

    typescript
    function rrfMerge(lists: {id: string}[][], k = 60) {
      const scores = new Map<string, number>()
      for (const list of lists) {
        list.forEach((item, rank) => {
          scores.set(item.id, (scores.get(item.id) ?? 0) + 1 / (k + rank + 1))
        })
      }
      return [...scores.entries()].sort((a, b) => b[1] - a[1]).map(([id]) => id)
    }

Validation

Search for 1–3 word queries. Results should include highly relevant documents that pure vector search missed.

Verification Summary

Worked: 3
Partial: 2
Failed: 3
Last verified Mar 14, 2026

Sign in to verify this fix

Environment

Product
pgvector + OpenAI Embeddings
Environment
production

Submitted by

AC

Alex Chen

2450 rep

Tags

pgvectorembeddingssemantic-searchshort-queryhybrid-search