FG
💻 Software🤖 AI & LLMs

Can I help get tinyint or half branches released?

Freshalmost 2 years ago
Mar 14, 20260 views
Confidence Score76%
76%

Problem

Is there more to do on the `tinyint` or `half` branches to get them released or are they ready to be put into `0.5.1`? If there is more to do for them, let me know and I'll see if it's something I could take care of (e.g.: code, docs, tests, etc).

Unverified for your environment

Select your OS to check compatibility.

1 Fix

Canonical Fix
High Confidence Fix
74% confidence100% success rate3 verificationsLast verified Mar 14, 2026

Solution: Can I help get tinyint or half branches released?

Low Risk

Sure, I'll add my use cases here. For context, we're doing chem/bio ML type work. Thanks for your thoughts on the below. Our smaller datasets have about 300 million (300M) molecules in them. For those molecules there are a few different types of vectors we'd like to generate. Some of these vectors are generated via cheminformatics methods (basically a molecular hash function with certain similari

74

Trust Score

3 verifications

100% success
  1. 1

    Sure, I'll add my use cases here. For context, we're doing chem/bio ML type work

    Sure, I'll add my use cases here. For context, we're doing chem/bio ML type work. Thanks for your thoughts on the below.

  2. 2

    Our smaller datasets have about 300 million (300M) molecules in them. For those

    Our smaller datasets have about 300 million (300M) molecules in them. For those molecules there are a few different types of vectors we'd like to generate. Some of these vectors are generated via cheminformatics methods (basically a molecular hash function with certain similarity properties) and others are generated via embeddings from various ML models.

  3. 3

    M sparse "count" vectors w/ 1024-2048 dimensions (Morgan Count Fingerprints]. Mo

    2. 300M embedding vectors w/ 128-1024 dimensions where each dimension is a non-zero decimal number (these are essentially the same as the standard embeddings everyone uses for various ML tasks). We would likely be ok giving up the precision of using half size floats or product quantization or any other similar technique. 3. I'll also add that we'd like 300M sparse bit vectors w/ 1024-2048 dimensions (Morgan Bit Fingerprints]. For these vectors, we'd like to be able to do ANN searches across them using tanimoto/jaccard distance. I recognize these are probably not going to be supported as true A

  4. 4

    For some higher level context, I'm currently running Postgres via GCP Cloud SQL

    For some higher level context, I'm currently running Postgres via GCP Cloud SQL to store our other molecular data and it would be nice to be able to integrate the molecular fingerprints/counts and embeddings into Postgres as well instead of needing to bring in another ANN lib/service (e.g.: faiss, pinecone, etc). I ran some rough numbers on storage cost and found that using the current pgvector, I estimate a (very hand wavy back of the envelope) storage cost of about $2k-$3k/yr for each 300M molecule fingerprint count vectors I store. Cutting that storage cost down by 50% or 75% would make it

Validation

Resolved in pgvector/pgvector GitHub issue #326. Community reactions: 3 upvotes.

Verification Summary

Worked: 3
Partial: 1
Last verified Mar 14, 2026

Sign in to verify this fix

Environment

Submitted by

AC

Alex Chen

2450 rep

Tags

pgvectorembeddingsvector-search