HammadB (u/HammadB) - Readit News

HammadB commented on Wal3: A Write-Ahead Log for Chroma, Built on Object Storage trychroma.com/engineering... · Posted by u/jeffchuber

swyx · 6 months ago

> In short, we believe that systems design is about contracts between reality and reality as we see it; reality as we see it is just a fiction. We believe systems exist in two forms: their actual reality and our fictional understanding of them. To us, the programmers, the system's reality is a fiction; we cannot see it, we can only observe it through additional SRE observability and monitoring mechanisms. To the system, this observability is a fiction. Whether due to tooling problems or just measurement error, the fiction presented via metrics, logs, and traces resembles the system itself, but is decidedly not the system itself.

this is very McLuhan/systemantics of you! all abstractions are leaky, but some abstractions let you look at the leaks.

TIL about setsums - one wonders if `fn setsum([String]) -> Digest` works then "nested setsums" must also work for very large scales.

one thing i missed from this post, which otherwise would score perfect marks for a technology introduciton, is benchmarks vs your comparisons on warpstream and friends.

HammadB · 6 months ago

You will always find a copy of McLuhan on the Chroma teams bookshelf !

HammadB commented on Show HN: Chroma Cloud – serverless search database for AI trychroma.com/cloud... · Posted by u/jeffchuber

curl-up · 7 months ago

Can Chroma handle combined structured+vector search? E.g. filtering by `category=X and value>Y` then finding top N matches by vec similarity?

HammadB · 7 months ago

Yes! It can - https://docs.trychroma.com/docs/querying-collections/metadat...

HammadB commented on Show HN: Chroma Cloud – serverless search database for AI trychroma.com/cloud... · Posted by u/jeffchuber

4ndrewl · 7 months ago

Not sure why this is voted down. The site doesn't really explain what problems of mine it'll solve (if any). I just came away thinking "oh, it's a thing. might have been a useful thing, who knows"

"Sell the sizzle, not the steak" is a real thing for a reason.

HammadB · 7 months ago

That is helpful feedback, thank you. We'll address this.

HammadB commented on Show HN: Chroma Cloud – serverless search database for AI trychroma.com/cloud... · Posted by u/jeffchuber

tristanho · 7 months ago

Doesn't this does seem like a bit of an... exact rip off of turbopuffer?

Down to the style of the webpages and the details of the pricing?

Here's the pricing calculator, for example:

https://share.cleanshot.com/JddPvNj3 https://share.cleanshot.com/9zqx5ypp

As a happy turbopuffer user, not sure why I'd want to use Chroma.

HammadB · 7 months ago

Hi! Hammad here - Chroma’s CTO. First off, I have an immense amount of respect for the Turbopuffer team, they’ve build a solid product.

I understand your point. Chroma Cloud has been quietly live in production for a year, and we have been discussing this architecture publicly for almost two years now. You can see this talk I gave at the CMU databases group - https://youtu.be/E4ot5d79jdA?si=i64ouoyFMevEgm3U. Some details have changed since then. But the core ideas remain the same.

The business model similarities mostly fall out of our architecture being similar, which mostly falls out of our constraints with respect to the workload being the same. There are only so many ways you can deliver a usage based billing model that is fair, understandable, and predictable. We aimed for a billing model that was all three, and this is what we arrived at.

On aesthetics, that’s always been our aesthetic, I think a lot of developer tools are leaning into the nostalgia of the early PC boom during this AI boom (fun fact, all the icons on our homepage are done by hand!).

On differences, we support optimized regexes vs full-scans, lending better performance. We also support trigram based full-text search which can often be useful for scenarios which need substring matches. We also support forking, which allows for cheap copy-on-write clones of your data, great for dataset versioning and tracking git repos with minimal cost. We've been building with support for generic sparse vectors (in beta) which enables techniques like SPLADE to be used, rather than just BM25. You can also run Chroma locally, enabling low-latency local workflows. This is great for AI apps where you need to iterate on a dataset until it passes evals, and then push it up to the cloud.

Chroma is Apache 2.0 open source - https://github.com/chroma-core/chroma and has a massive developer community behind it. Customers can run embedded, single-node and distributed Chroma themselves. We've suffered from depending on closed-source database startups and wanted to give developers using Chroma confidence in the longevity of their choice.

Lastly, we are building with AI workloads front and center and this changes what you build, how you build it and who you build for in the long term. We think search is changing and that the primary consumer of the search API for AI applications is shifting from human engineers, to language models. We are building some exciting things in this direction, more on that soon.

HammadB commented on Reasoning models don't always say what they think anthropic.com/research/re... · Posted by u/meetpateltech

HammadB · a year ago

There is an abundance of discussion on this thread about whether models are intelligent or not.

This binary is an utter waste of time.

Instead focus on the gradient of intelligence - the set of cognitive skills any given system has and to what degree it has them.

This engineering approach is more likely to lead to practical utility and progress.

The view of intelligence as binary is incredibly corrosive to this field.