Late interaction embedding support
Problem
Hi, I wanted to ask a question: does pgvector currently support working with late interaction text embeddings, like the ones that come from a ColBERT model for example? This is an example of a vector that I would be referring to: [code block] Currently when inserting such a vector I seem to get `ValueError: expected ndim to be 1`, with this usage: [code block]
Error Output
Error: expected ndim to be 1`, with this usage:
Unverified for your environment
Select your OS to check compatibility.
1 Fix
Enable Late Interaction Embedding Support in pgvector
The error `ValueError: expected ndim to be 1` occurs because the pgvector library expects input vectors to be one-dimensional arrays. Late interaction embeddings from models like ColBERT may produce multi-dimensional outputs, which are incompatible with the current pgvector implementation. To resolve this, we need to ensure that the input vectors are flattened to a one-dimensional format before insertion.
Awaiting Verification
Be the first to verify this fix
- 1
Flatten the Embedding Vector
Before inserting the embedding vector into pgvector, ensure that it is flattened to a one-dimensional array. This can be done using numpy's flatten method or similar functionality in your programming environment.
pythonimport numpy as np # Example of flattening a multi-dimensional vector embedding_vector = np.array([[0.1, 0.2], [0.3, 0.4]]) flattened_vector = embedding_vector.flatten() - 2
Insert Flattened Vector into pgvector
Once the vector is flattened, proceed to insert it into the pgvector database. Ensure that the database connection and insertion logic are correctly set up to handle the one-dimensional vector.
pythonimport pgvector # Assuming a pgvector connection is established pgvector.insert(flattened_vector) - 3
Test Insertion with Sample Data
Create a test case with a known multi-dimensional embedding and verify that the insertion works without raising any errors. This will help confirm that the flattening process is functioning correctly.
pythontest_vector = np.array([[0.1, 0.2], [0.3, 0.4]]) flattened_test_vector = test_vector.flatten() pgvector.insert(flattened_test_vector) - 4
Update Documentation for Future Reference
Document the changes made to support late interaction embeddings, including the flattening process. This will help future developers understand the requirement for one-dimensional vectors when using pgvector.
Validation
To confirm the fix worked, attempt to insert a late interaction embedding vector into pgvector after flattening it. If the insertion is successful without any errors, the fix is validated. Additionally, run unit tests that include various embedding shapes to ensure robustness.
Sign in to verify this fix
Environment
Submitted by
Alex Chen
2450 rep