For my client I've "built" a similar setup with Supabase + pgVector and I give the AI direct SQL access.
Here is the hard part: Just last week did I index 1.2 million documents for one project of one customer. They have pdfs with 1600 pages or PPTX files of >4GB. Plus lots of 3D/2D architecture drawings in proprietary formats.
The difficulty I see is - getting the data in ETL. This takes days and is fragile - keep RBAC - Supabase/pgVector needs lots of resources when adding new rows to the index -> wish the resources scale up/down automatically. Instead of having to monitor and switch to the next plan.
How could chroma help me here?
Many ways potentially - but one way is Chroma makes all this pain go away.
We're also working on some ingestion tooling that will make it so you don't have to scale, manage or run those pipelines.
1. I see the core is OSS, any chance of it being pushed up on crates.io(i see you already have a placeholder)
2. Is it embeddable or only as a Axum server?
Do you see all providers converging on similar alpha i.e cheap object storage, nvme drives,ssd cache to solve this?
Cheers and congrats on the launch
Chroma is fully OSS - embedded, single-node and distributed (data and control plane). afaik lance distributed is not OSS.
We do have plans to release the crate (enabling embedded chroma in rust) - but haven't gotten around to it yet. Hopefully soon!
> Do you see all providers converging on similar alpha i.e cheap object storage, nvme drives,ssd cache to solve this?
It's not only a new pattern in search workloads, but it's happening in streaming, KV, OLTP, OLAP, etc. Yea - it's the future.