Show HN: I wrote a GPU-less billion-vector DB for molecule search (live demo)
cheese-new.deepmedchem.comInput a SMILES string (or pick one molecule from the examples) and it returns up to 100k molecules closest in 3-D shape or electrostatic similarity – from 10+ billion scale databases — typically in under 5-10 s.
*Why it might interest HN*
* Entire index lives on disk — no GPU at query-time, less than ~10 GB RAM total. * Built from scratch (no FAISS index / Milvus / Pinecone). * Index-build cost: one Nvidia T4 (~ 300USD) for one 5.5B database. * Open to anyone, predict ADMET, export results as CSV/SDF.
Full write-up & benchmarks (DUD-E, LIT-PCBA, SVS) in the pre-print: https://chemrxiv.org/engage/chemrxiv/article-details/6725091...
Nice project! A regular on HN and creator of usearch built an embedding search for the same dataset and did a write up which is a great read.
https://ashvardanian.com/posts/usearch-molecules/