alexwebr (u/alexwebr)

alexwebr commented on Kafka at the low end: how bad can it get? broot.ca/kafka-at-the-low... · Posted by u/alexwebr

kod · 7 months ago

If it's a fundamental probability thing with randomized partition selection, put the actual probability of what you're describing in the article.

.25^20 is not a "somewhat unlucky sequence of events"

alexwebr · 7 months ago

(Author here)

Fair enough. I agree .25^20 is basically infinitesimal, and even with a smaller exponent (like .25^3) the odds are not great, so I appreciate you calling this out.

Flipping this around, though, if you have 4 workers total and 3 are busy with jobs (1 idle), your next job has only a 25% chance of hitting the idle worker. This is what I see the most in practice; there is a backlog, and not all workers are busy even though there is a backlog.

alexwebr commented on Kafka at the low end: how bad can it get? broot.ca/kafka-at-the-low... · Posted by u/alexwebr

techcode · 7 months ago

What that post describes (all work going to one/few workers) in practice doesn't really happen if you properly randomize (e.g. just use random UUID) ID of the item/task when inserting it into Kafka.

With that (and sharding based on that ID/value) - all your consumers/workers will get equal amount of messages/tasks.

Both post and seemingly general theme of comments here is trashing choice of Kafka for low volume.

Interestingly both are ignoring other valid reasons/requirements making Kafka perfectly good choice despite low volume - e.g.:

- multiple different consumers/workers consuming same messages at their own pace

- needing to rewind/replay messages

- guarantee that all messages related to specific user (think bank transactions in book example of CQRS) will be handled by one pod/consumer, and in consistent order

- needing to chain async processing

And I'm probably forgetting bunch of other use cases.

And yes, even with good sharding - if you have some tasks/work being small/quick while others being big/long can still lead to non-optimal situations where small/quick is waiting for bigger one to be done.

However - if you have other valid reasons to use Kafka, and it's just this mix of small and big tasks that's making you hesitant... IMHO it's still worth trying Kafka.

Between using bigger buckets (so instead of 1 fetch more items/messages and handle work async/threads/etc), and Kafka automatically redistributing shards/partitions if some workers are slow ... You might be surprised it just works.

And sure - you might need to create more than one topic (e.g. light, medium, heavy) so your light work doesn't need to wait for heavier one.

Finally - I still didn't see anyone mention actual real deal breakers for Kafka.

From the top of my head I recall a big one is no guarantee of item/message being processed only once - even without you manually rewinding/reprocessing it.

It's possible/common to have situations where worker picks up a message from Kafka, processes (wrote/materialized/updated) it and when it's about to commit the kafka offset (effectively mark it as really done) it realizes Kafka already re-partitioned shards and now another pod owns particular partition.

So if you can't model items/messages or the rest of system in a way that can handle such things ... Say with versioning you might be able to just ignore/skip work if you know underlying materialized data/storage already incorporates it, or maybe whole thing is fine with INSERT ON DUPLICATE KEY UPDATE) - then Kafka is probably not the right solution.

alexwebr · 7 months ago

(Author here)

You say: > What that post describes (all work going to one/few workers) in practice doesn't really happen if you properly randomize (e.g. just use random UUID) ID of the item/task when inserting it into Kafka.

I would love to be wrong about this, but I don't _think_ this changes things. When you have few enough messages, you can still get unlucky and randomly choose the "wrong" partitions. To me, it's a fundamental probability thing - if you roll the dice enough times, it all evens out (high enough message volume), but this article is about what happens when you _don't_ roll the dice enough times.

alexwebr commented on Kafka at the low end: how bad can it get? broot.ca/kafka-at-the-low... · Posted by u/alexwebr

jszymborski · 7 months ago

What do people recommend?

Especially for low levels of load, that doesn't require that the dispatcher and consumer are written in the same language.

alexwebr · 7 months ago

(Author here)

RabbitMQ or AWS SQS are probably good choices.