netpaladinx (u/netpaladinx)

netpaladinx commented on EloqKV, a distributed database with Redis compatible API (GPLv2 and AGPLv3) github.com/eloqdata/eloqk... · Posted by u/cloudsql

mo_abdallah0 · 8 days ago

I tested it and achieved around 42K operations per second. while it already serves as a in-memory key-value store to be a strong alternative to DragonflyDB, thanks to its multi-threaded architecture (each thread handles a separate keyspace, without any lock-based mutual-exclusion like KeyDB), it also goes beyond that and offers features like the transactional key-value operations and the persistence mechanisms.

netpaladinx · 8 days ago

I like the thread per core model. EloqKV, DragonflyDB, ScyllaDB all follow this pattern.

netpaladinx commented on EloqKV, a distributed database with Redis compatible API (GPLv2 and AGPLv3) github.com/eloqdata/eloqk... · Posted by u/cloudsql

fluxkernel · 9 days ago

Yes, we found it to be very useful for things that require durability and transactions. Previously we use JuiceFS community edition with Redis as metadata backend. The main issues are 1) potential metadata loss and 2) Memory capacity limit. We tried EloqKV and it seems to work really well. Anybody use EloqKV in production yet?

netpaladinx · 8 days ago

We’ve been using EloqKV to replace one of our largest Redis node (we didn’t want to run Redis Cluster, just a single big node). One pain point we had with Redis was the RDB fork causing latency jitter during persistence. EloqKV handles this much better — the fork-related stalls are gone, and so far it’s been a smooth drop-in replacement for our workload.

netpaladinx commented on Show HN: Chroma Cloud – serverless search database for AI trychroma.com/cloud... · Posted by u/jeffchuber

philip1209 · 9 days ago

Philip here from the Chroma engineering team.

Chroma supports multiple search methods - including vector, full-text, and regex search.

Four quick ways Chroma is different than pgvector: Better indexes, sharding, scaling, and object storage.

Chroma uses SPANN (Scalable Approximate Nearest Neighbor) and SPFresh (a freshness-aware ANN index). These are specialized algorithms not present in pgvector. [1].

The core issue with scaling vector database indexes is that they don't handle `WHERE` clauses efficiently like SQL. In SQL you can ask "select * from posts where organization_id=7" and the b-tree gives good performance. But, with vector databases - as the index size grows, not only does it get slower - it gets less accurate. Combining filtering with large indexes results in poor performance and accuracy.

The solution is to have many small indexes, which Chroma calls "Collections". So, instead of having all user data in one table - you shard across collections, which improves performance and accuracy.

The third issue with using SQL for vectors is that the vectors quickly become a scaling constraint for the database. Writes become slow due to consistency, disk becomes a majority vector indexes, and CPU becomes clogged by re-computing indexes constantly. I've been there and ultimately it hurts overall application performance for end-users. The solution for Chroma Cloud is a distributed system - which allows strong consistency, high-throughput of writes, and low-latency reads.

Finally, Chroma is built on object storage - vectors are stored on AWS S3. This allows cold + warm storage tiers, so that you can have minimal storage costs for cold data. This "scale to zero" property is especially important for multi-tenant applications that need to retain data for inactive users.

[1] https://www.youtube.com/watch?v=1QdwYWd3S1g

netpaladinx · 8 days ago

object storage here do you mean the recent released S3 Vector?

netpaladinx commented on Ursa: A leaderless, object storage–based alternative to Kafka streamnative.io/products/... · Posted by u/netpaladinx

netpaladinx · a month ago

Ursa published a blog post saying their leaderless, stateless, object storage–based Kafka replacement can reduce costs by up to 95%. Has anyone here tried Ursa in production? How much cost reduction have you actually seen compared to Kafka or MSK in real workloads?

netpaladinx commented on Ask HN: Why do so many developers dislike C when I find it inspiring? · Posted by u/silentpuck

netpaladinx · a month ago

All your points are valid. I don't think most people "dislike" C. People have options and most choose non-C. From my perspective, when the software or the system itself is already extremely complex, using C just adds more complexity on top. Many people including me choose not to add more.

netpaladinx commented on Tesla plans to launch Robotaxis in San Francisco this weekend reuters.com/business/auto... · Posted by u/mikhael

netpaladinx · a month ago

I remembered other Robotaxis operated one year ago

netpaladinx commented on How Anthropic teams use Claude Code anthropic.com/news/how-an... · Posted by u/yurivish

netpaladinx · a month ago

Which database is behind Claude Code? Cursor built on top of object storage. They failed with Yugabytes and Postgres

netpaladinx commented on Show HN: EloqKV – Scalable distributed ACID key-value database with Redis API eloqdata.com/blog/2024/08... · Posted by u/hubertzhang

fizx · a year ago

Redis's transaction api is terrible, and doesn't work across shards. Any reasonable transactions are done in Lua, and because Lua mostly works well, there's not a lot of pressure to fix transactions.

However, if you're giving redis access to different tenants, Lua is too dangerous.

I'd love to see a "real" transaction API for Redis.

netpaladinx · a year ago

Talking about transactions in Redis, one area came on top of my head is metadata in file systems. I've seen colleagues/collaborators run large-scale training on a distributed file system w/ a billion files, which puts a lot of pressure on the metadata part. They tried a few options and Redis was one of them. It's fast and Lua is good enough to support metadata ops. But the thing is that it cannot scale (or Lua is gone) and may lose data from time to time, which is annoying. It looks like this durable-transactional combination may fit in. Will wait to see how this is unfolded.

netpaladinx commented on Reactive Relational Algebra taylor.town/reactive-rela... · Posted by u/surprisetalk

shae · a year ago

Have you read up on differential data flow? Might be what you want?

netpaladinx · a year ago

Probably not. Data flow is declartive in data transformations and "differential" refers to be incremental. But what the link tries to model seems to relate to asynchronous transformations. Not on the same level.