My general opinion, off the cuff, from having worked at both small (hundreds of events per hour) and large (trillions of events per hour) scales for these sorts of problems:
1. Do you really need a queue? (Alternative: periodic polling of a DB)
2. What's your event volume and can it fit on one node for the foreseeable future, or even serverless compute (if not too expensive)? (Alternative: lightweight single-process web service, or several instances, on one node.)
3. If it can't fit on one node, do you really need a distributed queue? (Alternative: good ol' load balancing and REST API's, maybe with async semantics and retry semantics)
4. If you really do need a distributed queue, then you may as well use a distributed queue, such as Kafka. Even if you take on the complexity of managing a Kafka cluster, the programming and performance semantics are simpler to reason about than trying to shoehorn a distributed queue onto a SQL DB.
I suspect the common issue with small scale projects is that it's not atypical for the engineers involved to perform a joint optimization of "what will work well for this project", and "what will work well at my next project/job." Particularly in startups where the turnover/employer stability is poor - this is the optimal action for the engineers involved.
Unless employees expect that their best rewards are from making their current project as simple and effective as possible - it is highly unlikely that the current project will be as simple as it could be.
What I've found to be even more common than resume driven development has been people believing that they either have or will have "huge scale". But the problem is that their goal posts are off by a few orders of magnitude and they will never, ever have the sort of scale required for these types of tools.
At my current job working for a big corporation, a big reason why we use Kafka for non Kafka workloads is that getting alternate stacks approved is annoyingly difficult. Someone already went through the pain of getting Kafka on the company network for their big data use case, and entreprise IT will set it up for us. Using something else for queueing would require way more paperwork.
This is something to catch in hiring and performance evaluation. Hire people who don't build things to pad their own CVs, tell them to stop if you failed, fire them if that failed
Semantic but important point, Kafka is not a queue, it's a distributed append only log. I deal with so many people who think it's a super-scalable replacement for an MQ, and it's such the wrong way to think about it.
Yes but the practical reality of it is it can be used exactly the same way as you would do a queue and you can make it work just as well as any MQ based system. I know this as I moved from a RabbitMQ system to Kafka for additionally scalability requirements and it worked perfectly.
So sure "technically" it's not a queue, but in reality its used as a queue for 1000s of companies around the world for huge production workloads which no MQ system can support.
To be fair, any (immutable) data structure that includes the creation timestamp can be a queue. It might not be a good queue, but it can be used as one.
Yup, that's a fair and important distinction, Kafka comes with very little of the ergonomics that make MQ good at what it does. In exchange you get blistering scale, for when blistering scale is the main problem.
> Do you really need a queue? (Alternative: periodic polling of a DB)
In my experience it’s not the reads, but the writes that are hard to scale up. Reading is cheap and can be sometimes done off a replica. Writing to a PostgreSQL at high sustained rate requires careful tuning and designs. A stream of UPDATEs can be very painful, INSERTs aren’t cheap, and even a batched COPY blocks can be tricky.
Plus of course you can take out the primary even with a read from a replica. It's not a trivial feat, but you can achieve it with the combination of streaming replication and an hours-long read from the replica for massive analytical workloads. For large reads Postgres will create temporary tables as needed, and when those in the replica end up far enough, the cascading effect through replication backpressure will cause primary to block further writes from getting through...
The scars from that kind of outage will never truly heal.
Postgres’ need (modulo running it on ZFS) for full-page writes [0], coupled with devs’ apparent need to use UUIDv4 everywhere - along with over-indexing - is a recipe to drag writes down to the floor, yes.
5. Organize your code so it can work with both a PostgreSQL-based queue or a Kafka based queue. There should be only one code file that actually knows which of the two you are using.
Then, if you ever need to switch to something more performant, it will be relatively easy.
It's a queue... how bad can you screw this up? My guess is, in most corporate environment, very very badly. Somehow something as complicated as consuming a queue (which isn't very complicated at all) will be done in such a way that it will require many months to change which queue is used in the future.
(and possibly Flink/RisingWave/Arroyo somewhere in order to do some alerting/incrementally updated materialized views/ etc)
this seems "simple enough" (but I don't have any experience with Redpanda) but is indeed one more moving part compared to MQTT -> Postgres (as a queue) -> Postgres/Timescaledb + S3
Questions:
1. my "fear" would be that if I use the same Postgres for the queue and for my business database, the "message ingestion" part could block the "business" part sometimes (locks, etc)? Also perhaps when I want to update the schema of my database and not "stop" the inflow of messages, not sure if this would be easy?
2. also that since it would write messages in the queue and then delete them, there would be a lot of GC/Vacuuming to do, compared to my business database which is mostly append-only?
3. and if I split the "Postgres queue" from "Postgres database" as two different processes, of course I have "one less tech to learn", but I still have to get used to pgmq, integrate it, etc, is that really much easier than adding Redpanda?
4. I guess most Postgres queues are also "simple" and don't provide "fanout" for multiple things (eg I want to take one of my IoT message, clean it up, store it in my timescaledb, and also archive it to S3, and also run an alert detector on it, etc)
Re 1. Look up non-blocking migrations for postgres. You can generally do large schema migrations while only briefly taking exclusive locks. It's a common mistake to perform a blocking migration and lock up your database (e.g. using CREATE INDEX on an existing table instead of CREATE INDEX CONCURRENTLY).
There are globally shared resources, but for the most part, locks are held on specific rows or tables. Unrelated transactions generally won't block on each other.
Also running a Very High Availability cluster is non-trivial. It can take a minute to fail over to a replica, and a busy database can take a while to replay the WAL after a reboot before it's functional again. Most people are OK with a couple minutes of downtime for the occasional reboot though.
I think this really depends on your scale. Are you doing <100 messages/second? Definitely stick with postgres. Are you doing >100k messages/second? Think about Kafka/redpanda. If you were comfortable with postgres (or you will be since you are building the rest of your project with it), then you want to stick with postgres longer, but if you are barely using it and would struggle to diagnose an issue, then you won't benefit from consolidating.
Postgres will also be more flexible. Kafka can only do partitions and consumer groups, so if your workload doesn't look like that (e.g. out of order processing), you might be fighting Kafka.
> 1. my "fear" would be that if I use the same Postgres for the queue and for my business database...
This is a feature, not a bug. In this way you can pair the handling of the message with the business data changes which result in the same transaction. This isn't quite "exactly-once" handling, but it's really really close!
> 2. also that since it would write messages in the queue and then delete them, there would be a lot of GC/Vacuuming
Generally it's best practice in this case to never delete messages from a SQL "queue", but toggle them in-place to consumed and periodically archive to a long-term storage table. This provides in-context historical data which can be super helpful when you need to write a script to undo or mitigate bad code which resulted in data corruption.
Alternatively when you need to roll back to a previous state, often this gives you a "poor woman's undo", by restoring a time-stamped backup, copying over messages which arrived since the restoration point, then letting the engine run forwards processing those messages. (This is a simplification of course, not always directly possible, but data recovery is often a matter of mitigations and least-bad choices.)
Basically, saving all your messages provides both efficiency and data recovery optionality.
> 3...
Legit concern, particularly if you're trying to design your service abstraction to match an eventual evolution of data platform.
> 4. don't provide "fanout" for multiple things
What they do provide is running multiple handling of a queue, wherein you might have n handlers (each with its own "handled_at" timestamp column in the DB), and different handles run at different priorities. This doesn't allow for workflows (ie a cleanup step) but does allow different processes to run on the same queue with different privileges or priorities. So the slow process (archive?) could run opportunistically or in batches, where time-sensitive issues (alerts, outlier detection, etc) can always run instantly. Or archiving can be done by a process which lacks access to any user data to algorithmically enforce PCI boundaries. Etc.
> I want to rewrite some of my setup, we're doing IoT, and I was planning on
Is this some scripting to automate your home, or are you trying to build some multi-tenant thing that you can sell?
If it's just scripting to automate your home, then you could probably get away with a single server and on-disk/in-memory queuing, maybe even sqlite, etc. Or you could use it as an opportunity to learn those technologies, but you don't really need them in your pipeline.
It's amazing how much performance you can get as long as the problem can fit onto a single node's RAM/SSD.
Re (2) there is a lot of vacuuming, but the table is small, and it's usually very fast and productive.
You can run into issues with scheduled queues (e.g. run this job in 5 minutes) since the tables will be bigger, you need an index, and you will create the garbage in the index at the point you are querying (jobs to run now). This is a spectacularly bad pattern for postgres at high volume.
> Also perhaps when I want to update the schema of my database and not "stop" the inflow of messages, not sure if this would be easy?
Doesn't PostgreSQL have transactional schema updates as a key feature? AIUI, you shouldn't be having any data loss as a result of such changes. It's also common to use views in order to simplify the management of such updates.
I'm a java dev and maybe my projects are about big integrations, but I've always needed queue like constructs and polling from a db was almost always a headache, especially with multiple consumers and publishers.
Sure it can be done, and in many projects we do have cron-jobs on different pods -- not a global k8s cron-job, but legacy cron jobs and it works fine.
Kafka does not YET support real queue (but I'm sure there's a high profile KIP to have true queue like behavior, per consumer group, with individual commits), and does not support server side filtering.
But consumer groups and partitions have been such a blessing for me, it's very hard to overstate how useful they are with managing stateful apps.
Maybe in the past this was true, or if you’re using an inferior DB. I know first hand that a Postgres table can work great as a queue for many millions of events per day processed by thousands of workers polling for work from it concurrently. With more than a few hundred concurrent pollers you might want a service, or at least a centralized connection pool in front of it though.
I agree with nearly everything except your point (1).
Periodic polling is awkward on both sides: you add arbitrary latency _and_ increase database load proportional to the number of interested clients.
Events, and ideally coalesced events, serve the same purpose as interrupts in a uniprocess (versus distributed) system, even if you don't want a proper queue. This at least lets you know _when_ to poll and lets you set and adjust policy on when / how much your software should give a shit at any given time.
From a database load perspective, Postgres can get you pretty far. The reads triggered by each poll should be trivial index-only scans served right out of RAM. Even a modest Postgres instance should be able to handle thousands per second.
The limiting factor for most workloads will probably be the number of connections, and the read/write mix. When you get into hundreds or thousands of pollers and writing many things to the queue per second Postgres is going to lose its luster for sure.
But in my experience with small/medium companies, a lot of workloads fit very very comfortably into what Postgres can handle easily.
I dont disagree, and I am trying to argue for it myself, and have used postgres as a "queue" or the backlog of events to be sent (like outbox pattern). But what if I have 4 services that needs to know X happened to customer Y? I feel like it quickly becomes cumbersome with a postgres event delivery to make sure everyone gets the events they need delivered. The posted link tries to address this at least.
The standard approach, which Kafka also uses beneath all the libraries hiding it from you, is:
The publisher has a set of tables (topics and partitions) of events, ordered and with each event having an assigned event sequence number.
Publisher stores no state for consumers in any way.
Instead, each consumer keeps a cursor (a variable holding an event sequence number) indicating how far it has read for each event log table it is reading.
Consumer can then advance (or rewind) its own cursor in whatever way it wishes. The publisher is oblivious to any consumer side state.
This is the fundamental piece of how event log publishing works (as opposed to queues which is something else entirely; and the article talks about both usecases).
Call me dumb - I'll take it! But if we really are trying to keep it simple simple...
Then you just query from event_receiver_svcX side, for events published > datetime and event_receiver_svcX = FALSE. Once read set to TRUE.
To mitigate too many active connections have a polling / backoff strategy and place a proxy infront of the actual database to proactively throttle where needed.
> If it can't fit on one node, do you really need a distributed queue? (Alternative: good ol' load balancing and REST API's, maybe with async semantics and retry semantics)
That sounds distributed to me, even if it wires different tech together to make it happen. Is there something about load balancing REST requests to different DB nodes that is less complicated than Kafka?
> Is there something about load balancing REST requests to different DB nodes that is less complicated than Kafka?
To be clear I wasn't talking about DB nodes, I was talking about skipping an explicit queue altogether.
But let's say you were asking about load balancing REST requests to different backend servers:
Yes, in the sense that "load balanced REST microservice with retry logic" is such a common pattern that is better understood by SWE's and SRE's everywhere.
No, in the sense that if you really did just need a distributed queue then your life would be simpler reusing a battle-tested implementation instead of reinventing that wheel.
An internal data lake at a FAANG company. I don't really want to go into more detail for my own privacy, but it was definitely the kind of problem you only have when you have a FAANG-sized footprint.
100%; often batch integration - pulling data from REST endpoint/endpoints in the background, with reasonable frequency like every 5 -15 minutes - is good enough ;)
re 4) If you're there, at the risk of drawing the ire of the "cloud is always too expensive" club, be sure you really really really want to run something like Kafka yourself, and not use a hyperscaler's platform queue/queue-ish system, aka SQS or pubsub or whatever Azure/your platform has.
Kafka has its own foibles and isn't a trivia set-it-and-forget it to run at scale.
>The claim is that it handles 80%+ of their use cases with 20% of the development effort. (Pareto Principle)
The Pareto principle is not some guarantee applicable to everything and anything saying that any X will handle 80% of some other thing's use cases with 20% the effort.
One can see how irrelevant its invocation is if we reverse: does Kafka also handle 80% of what Postgres does with 20% the effort? If not, what makes Postgres especially the "Pareto 80%" one in this comparison? Did Vilfredo Pareto had Postgres specifically in mind when forming the principle?
Pareto principle concerns situations where power-law distributions emerge. Not arbitrary server software comparisons.
Just say Postgres covers a lot of use cases people mindlessly go to shiny new software for that they don't really need, and is more battled tested, mature, and widely supported.
> Yes, obviously. The author doesn't say otherwise.
Someone doesn't need to spell something out explicitly to imply it, or to fall to the kind of mistake I described.
While the author might not say otherwise, they do invoke the Pareto principle out of context, as if it's some readily applicable theorem.
>>If not, what makes Postgres especially the "Pareto 80%" one in this comparison?
> Because its simpler.
Something being simpler than another thing doesn't make it a Pareto "80%" thing.
Yeah, I know you don't say this is always the case explicitly. But, like with the OP, your answer uses this as if it's an argument in favor of smething being the "Pareto 80%" thing.
It just makes it simpler. In the initial Pareto formulation is was about wealth accumulation even, which has nothing to do with simplicity or features or even with comparing different classes of things (both sides referred to the same thing, people. Specifically about 20% of the population owning 80% of the land).
Is the mapping of use cases to software functionality not a power law distribution? Meaning there are a few use cases that have a disproportionate affect on the desired outcome if provided by the software?
It probably applies better to users of software, e.g. 80% of users use just 20% of the features in Postgres (or MS Word). This probably only works, roughly, when the number of features is very large and the number of users is very large, and it's still very very rough, kinda obviously. (It could well be 80% / 5% in these cases!)
For very simple software, most users use all the features. For very specialized software, there's very few users, and they use all the features.
> The claim is that it handles 80%+ of their use cases with 20% of the development effort. (Pareto Principle)
This is different units entirely! Development effort? How is this the Pareto Principle at all?
(To the GP's point, would "ls" cover 80% of the use cases of "cut" with 20% of the effort? Or would MS Word cover 80% of the use cases of postgresql with 20% of the effort? Because the scientific Pareto Principle tells us so?)
Hey, it's really not important, just an idea that with Postgres you can cover a lot of use cases with a lot less effort than configuring/maintaining a Kafka cluster on the side, and that's plausible. It's just that some "nerds" who care about being "technically correct" object to using the term "pareto principle" to sound scientific here, that bit is just nonsense.
I believe the Pareto principle is used more of a figure of speech nowadays, it doesn't mean the 80 to 20 ratio literally. When I hear someone invoking the principle I tend to think of it as: this should solve the problem enough for our case and with less (usually much less) work than a complete solution.
> if we reverse: does Kafka also handle 80% of what Postgres does with 20% the effort?
First, it would be inverse, not reverse.
Second, no it doesn't work that way, that's the point of the Pareto principle in the first place, what is 80% is always 80% and what is 20% is always 20%.
>Second, no it doesn't work that way, that's the point of the Pareto principle in the first place, what is 80% is always 80% and what is 20% is always 20%.
I know, since that's the whole point I was making. That the OP picked an arbitrary side to give the 80%, and that one could just as well pick the other one, and that you need actual arguments (and some kind of actual measurable distribution) to support one or the other being the 80% (that is, merely invoking the Pareto principle is not an argument).
You have to be careful with the approach of using Postgres for everything. The way it locks tables and rows and the serialization levels it guarantees are not immediately obvious to a lot of folks and can become a serious bottle-neck for performance-sensitive workloads.
I've been a happy Postgres user for several decades. Postgres can do a lot! But like anything, don't rely on maxims to do your engineering for you.
Yes, performance can be a big issue with postgres. And vertical scaling can really put a damper on things when you have a major traffic hit. Using it for kafka is misunderstanding the one of the great uses of kafka which is to help deal with traffic bursts. All of a sudden your postgres server is overwhelmed and the kafka server would be fine.
It's worth noting that Oracle has solved this problem. It has horizontal multi-master scalability (not sharded) and a queue subsystem called TxEQ which scales like Kafka does, but it's also got the features of a normal MQ broker. You can dequeue a message into a transaction, update tables in that same transaction, then commit to remove the message from the queue permanently. You can dequeue by predicate, delay messages, use producer/consumer patterns etc. It's quite flexible. The queues can be accessed via SQL stored procs, or client driver APIs, or it implements a Kafka compatible API now too I think.
If you rent a cloud DB then it can scale elastically which can make this cheaper than Postgres, believe it or not. Cloud databases are sold at the price the market will bear not the cost of inputs+margin, so you can end up paying for Postgres as much as you would for an Oracle DB whilst getting far fewer features and less scalability.
Source: recently joined the DB team at Oracle, was surprised to learn how much it can do.
Agree but we really have to put a number on baseline traffic and max traffic burst in order to be productive in the discussion. I would argue that the majority of use cases never need to be designed for a max-traffic-number that PG can't handle
Definitely, this is also one of the direction Rails is heading[1]: provide a basis setup most of the people can use out of the box. And if needed you can always plug in more "mature" solutions afterwards.
I wish postgres would add a durable queue like data structure. But trying to make a durable queue that can scale beyond what a simple redis instance can do starts to run into problems quickly.
This is true of any data storage. You have to understand the concurrency model and assumptions, and know where bottlenecks can happen. Even among relational databases there are significant differences.
Exactly. Or worse, you treat one as a straightforward black box swap in replacement for another. If you're looking to scale, you _will_ need to code to the idiosyncraties of your chosen solution.
Postgres doesnt scale into oblivion, but it can take some serious chunks of data once you start batching and making sure a every operation only touches single row with no transactions needed.
When people say "just use postgres" it's because their immediate need is so low that this doesn't matter.
And the thing is, a server from 10 years ago running postgres (with a backup) is enough for most applications to handle thousands of simultaneous users. Without even going into the kinds of optimization you are talking about. Adding ops complexity for the sake of scale on the exploratory phase of a product is a really bad idea when there's an alternative out there that can carry you until you have fit some market. (And for some markets, that's enough forever.)
You would typically want to use the same database instance for your queue as long as you can get away with it because then transaction handling is trivial. As soon as you move the queue somewhere else you need to carefully think about how you'll deal with transactionality.
Yes, I often use PG for queues on the same instance. Most of the time you dont see any negative effects. For a new project with barely any users it doesn’t matter.
True, but you have to have a really intensive workload to hit its limits; something in the order of tens of thousands writes per second; and even then, you can shard to a few instances. So yes, there is a limit - but in practice, not for most systems
How do you implement "unique monotonically-increasing offset number"?
Naive approach with sequence (or serial type which uses sequence automatically) does not work. Transaction "one" gets number "123", transaction "two" gets number "124". Transaction "two" commits, now table contains "122", "124" rows and readers can start to process it. Then transaction "one" commits with its "123" number, but readers already past "124". And transaction "one" might never commit for various reasons (e.g. client just got power cut), so just waiting for "123" forever does not cut it.
Notifications can help with this approach, but then you can't restart old readers (and you don't need monotonic numbers at all).
The "unique monotonically-increasing offset number" use case works just fine. I need a unique sequence number in ascending order doesn't (your problem). Why you need two queue to share the same sequence object is your problem I think.
Another way to speed it up is to grab unique numbers in batches instead of just getting them one at a time. No idea why you want your numbers to be in absolute sequence. That's hard in a distributed system. Probably best to relax that constraint and find some other way to track individual pieces of data. Or even better, find a way so you don't have to track individual rows in a distributed system.
The article describes using a dedicated table for the counter, one row per table, in the same transaction (so parallel writers to the same table wait for each other through a lock on that row).
The article speaks of two usecases, work queue and pub/sub event log. You talk about the first and the comment you reply to the latter. You need event sequence numbering for the pub/sub event log.
In a sense this is what Kafka IS architecturally: The component that assigns event sequence numbers.
You can fill in a noop for sequence number 123 after a timeout. You also need to be able to kill old transactions so that the transaction which was assigned 123 isn't just chilling out (which would block writing the noop).
Another approach which I used in the past was to assign sequence numbers after committing. Basically a separate process periodically scans the set of un-sequenced rows, applies any application defined ordering constraints, and writes in SNs to them. This can be surprisingly fast, like tens of thousands of rows per second. In my case, the ordering constraints were simple, basically that for a given key, increasing versions get increasing SNs. But I think you could have more complex constraints, although it might get tricky with batch boundaries
My approach is: select max(id), and commit with id=max(id)+1. If commit worked, then all good. If commit failed because of unique index violation, repeat the transaction from the beginning. I think it should work correctly with proper transaction isolation level.
I have this problem in the system I work on - the short nuance-less answer from my experience is that, once your scale gets large enough, you can't prevent ordering issues entirely and you have to build the resilience into the architecture and the framing of the problem. You often end up paying for consistency with latency.
I think you may be talking past each other. In the approach taken in the article and the parent comment, if the event sequence number allocation of the writer races the reader cursor position in the wrong way, events will NEVER BE DELIVERED.
So it is a much more serious issue at stake here than event ordering/consistency.
As it happens, if you use event log tables in SQL "the Kafka way" you actually get guarantee on event ordering too as a side effect, but that is not the primary goal.
Funnily enough, I was just designing a queue exactly this way, thanks for catching this. (chat GPT meanwhile was assuring me the approach was airtight)
Nowhere close to Kafka's maturity in client side tooling but it is an approach for how a library stack could be built on top making this convenient and have the same library toolset support many storage engines. (On the server/storage side, Postgres is of course as mature as Kafka...)
With the advent of tools like llms in editors, it is now viable to create clients and solve these gaps quite easily. It feels like the next low hanging fruit to do in many places not client friendly enough.
I'll soon get to make technology choices for a project (context: we need an MQTT broker) and Kafka is one of the options, but I have zero experience with it. Aside from the obivous red flag that is using something for the first time in a real project, what is it that you dislike about Kafka?
Has this person actually benchmarked kafka? The results they get with their 96 vcpu setup could be achieved with kafka on the 4 vcpu setup. Their results with PG are absurdly slow.
If you don't need what kafka offers, don't use it. But don't pretend you're on to something with your custom 5k msg/s PG setup.
Is it about what Kafka could get or what you need right now.
Kafka is a full on steaming solution.
Postgres isn’t a buzzword. It can be a capable placeholder until it’s outgrown. One can arrive at Kafka with a more informed run history from Postgres.
Your name sounds familiar. I think you may be one of the people at RedPanda with whom I’ve corresponded. It’s been a few years though, so maybe not.
A colleague and I (mostly him, but on my advice) worked up a set of patches to accept and emit JSON and YAML in the CLI tool. Our use case at the time was setting things up with a config management system using the already built tool RedPanda provides without dealing with unstructured text.
We got a lot of good use out of RedPanda at that org. We’ve both moved on to a new employer, though, and the “no offering RedPanda as a service” spooked the company away from trying it without paying for the commercial package. Y’all assured a couple of us that our use case didn’t count as that, but upper management and legal opted to go with Kafka just in case.
To the issue of complexity, is Redpanda suitable as a "single node implementation" where a Kafka cluster is not needed due to data volume, but the Kafka message bus pattern is desired?
I may be reading a bit extra, but my main take on this is: "in your app, you probably already have PostgreSQL. You don't need to set up an extra piece of infrastructure to cover your extra use case, just reuse the tool you already have"
It's very common to start adding more and more infra for use cases that, while technically can be better cover with new stuff, it can be served by already existing infrastructure, at least until you have proof that you need to grow it.
But in this case, it is like saying "You don't need a fuel truck. You can transport 9,000 gallons of gasoline between cities by gathering 9,000 1-gallon milk jugs and filling each, then getting 4,500 volunteers to each carry 2 gallons and walk the entire distance on foot."
In this case, you do just need a single fuel truck. That's what it was built for. Avoiding using a design-for-purpose tool to achieve the same result actually is wasteful. You don't need 288 cores to achieve 243,000 messages/second. You can do that kind of throughput with a Kafka-compatible service on a laptop.
>> If you don't need what kafka offers, don't use it.
> This is literally the point the author is making.
Exactly! I just don't understand why HN invariably always tends to bubble up the most dismissive comments to the top that don't even engage with the actual subject matter of the article!
Depends on how you configure the clients, ask me how I know that using a K8s pod id in a consumer group id is a really bad idea - or how setting batch size to 1 and linger to 0 is a really bad idea - the former blows up disk (all those unique consumer groups cause the backing topic to consume a lot of space, as the topic is by default only compacted) and the latter thrashes request handler CPU time.
This doesn't even make sense. How do you know what the network links or the other bottlenecks are like? There are a grandiose number of assumptions being made here.
The 96 vcpu setup with 24xlarge instance costs about $20k/month on AWS before discounts. And one thing you don’t want in a pub sub system is a single instance taking all the read/writes. You can run a sizeable Kafka cluster for that kind of money in AWS.
I remember doing 900k writes/s (non-replicated) already back on kafka 0.8 with a random physical server with an old fusionio drive (says something about how long ago this was :D).
It's a fair point that if you already have a pgsql setup, and only need a few messages here and there, then pg is fine. But yeah, the 96 vcpu setup is absurd.
Is anyone actually reading the full article, or just reacting to the first unimpressive numbers you can find and then jumping on the first dismissive comment you can find here?
Benchmarking Kafka isn't the point here. The author isn't claiming that Postgres outperforms Kafka. The argument is that Postgres can handle modest messaging workloads well enough for teams that don't want the operational complexity of running Kafka.
Yes, the throughput is astoundingly low for such a powerful CPU but that's precisely the point. Now you know how well or how bad Postgres performs on a beefy machine. You don't always need Kafka-level scale. The takeaway is that Postgres can be a practical choice if you already have it in place.
So rather than dismissing it over the first unimpressive number you find, maybe respond to that actual matter of TFA. Where's the line where Postgres stops being "good enough"? That'll be something nice to talk about.
Then the author should have gone on to discuss not just the implementation they now have to maintain, but also all the client implementations they'll have to keep re-creating for their custom solution. Or they could talk about all the industry standard tools that work with kafka and not their custom implementation.
Or they could have not mentioned kafka at all and just demonstrated their pub/sub implementation with PG. They could have not tried to make it about the buzzword resume driven engineering people vs. common sense folks such as himself.
The problem is benchmarking on the 96 vcpu server, because at that point the author seems to miss the point of Kafka. That's just a waste of money for that performance.
I wonder if OP could have got different results if they implemented a different schema as opposed to mimicking Kafka's setup with the partitions, consumer offsets, etc.
I might well be talking out of my arse but if you're going to implement pub/sub in Postgres, it'd be worth designing around its strengths and going back to basics on event sourcing.
Had the same thoughts, weird it didn't include Kafka numbers.
Never used Kafka myself, but we extensively use Redis queues with some scripts to ensure persistency, and we hit throughputs much higher than those in equivalent prod machines.
Same for Redis pubsubs, but those are just standard non-persistent pubsubs, so maybe that gives it an upper edge.
Just checked my single node Kafka setup which currently handles 695.27k e/s (average daily) into elasticsearch without breaking a sweat. kafka has been the only stable thing in this whole setup.
If you choose wisely, things should suck less overall as you move forward. That's kind of the overall goal, otherwise we'd all still be toggling raw machine code into machines using switches.
This is it right here. My foil is the Elasticsearch replacement because PG has inverted indices. The ergonomics and tunability of these in PG are terrible compared to ES. Yes, it will search, but I wouldn’t want to be involved in constructing or maintaining that search.
For me the killer feature of Kafka was the ability to set the offset independently for each consumer.
In my company most of our topics need to be consumed by more than one application/team, so this feature is a must have. Also, the ability to move the offset backwards or forwards programmatically has been a life saver many times.
Does Postgres support this functionality for their queues?
Isn't it just a matter of having each consumer use their own offset? I mean if the queue table is sequentially or time-indexed, the consumer just provides a smaller/earlier key to accomplish the offset?
(Maybe I'm missing something here?)
Kafka allows you to have a consumer group… you can have multiple workers processing messages in parallel, and if they all use the same group id, the messages will be sharded across all the workers using that key… so each message will only be handled by one worker using that key, and every message will be given to exactly one worker (with all the usual caveats of guaranteed-processed-exactly-once queues). Other consumers can use different group keys and they will also get every single message exactly once.
So if you want an individual offset, then yes, the consumer could just maintain their own… however, if you want a group’s offset, you have to do something else.
The article basically states unless you need a lot of throughput, you probably don't need Kafka. (my interpretation extends to say) You probably don't need offsets because you don't need multi-threaded support because you don't need multiple threads.
I don't know what kind of native support PG has for queue management, the assumption here is that a basic "kill the task as you see it" is usually good enough and the simplicity of writing and running a script far outweighs the development, infrastructure and devops costs of Kafka.
But obviously, whether you need stuff to happen in 15 seconds instead of 5 minutes, or 5 minutes instead of an hour is a business decision, along with understanding the growth pattern of the workload you happen to have.
Most of my professional life I have considered Postgres folks to be pretty smart… while I by chance happened to go with MySQL and it became the rdbms I thought in by default.
Heavily learning about Postgres recently has been okay, not much different than learning the tweaks for msssl, oracle or others. Just have to be willing to slow down a little for a bit and enjoy it instead of expecting to thrush thru everything.
1. Do you really need a queue? (Alternative: periodic polling of a DB)
2. What's your event volume and can it fit on one node for the foreseeable future, or even serverless compute (if not too expensive)? (Alternative: lightweight single-process web service, or several instances, on one node.)
3. If it can't fit on one node, do you really need a distributed queue? (Alternative: good ol' load balancing and REST API's, maybe with async semantics and retry semantics)
4. If you really do need a distributed queue, then you may as well use a distributed queue, such as Kafka. Even if you take on the complexity of managing a Kafka cluster, the programming and performance semantics are simpler to reason about than trying to shoehorn a distributed queue onto a SQL DB.
Unless employees expect that their best rewards are from making their current project as simple and effective as possible - it is highly unlikely that the current project will be as simple as it could be.
So sure "technically" it's not a queue, but in reality its used as a queue for 1000s of companies around the world for huge production workloads which no MQ system can support.
In my experience it’s not the reads, but the writes that are hard to scale up. Reading is cheap and can be sometimes done off a replica. Writing to a PostgreSQL at high sustained rate requires careful tuning and designs. A stream of UPDATEs can be very painful, INSERTs aren’t cheap, and even a batched COPY blocks can be tricky.
The scars from that kind of outage will never truly heal.
0: https://www.enterprisedb.com/blog/impact-full-page-writes
Then, if you ever need to switch to something more performant, it will be relatively easy.
It's a queue... how bad can you screw this up? My guess is, in most corporate environment, very very badly. Somehow something as complicated as consuming a queue (which isn't very complicated at all) will be done in such a way that it will require many months to change which queue is used in the future.
MQTT -> Redpanda (for message logs and replay, etc) -> Postgres/Timescaledb (for data) + S3 (for archive)
(and possibly Flink/RisingWave/Arroyo somewhere in order to do some alerting/incrementally updated materialized views/ etc)
this seems "simple enough" (but I don't have any experience with Redpanda) but is indeed one more moving part compared to MQTT -> Postgres (as a queue) -> Postgres/Timescaledb + S3
Questions:
1. my "fear" would be that if I use the same Postgres for the queue and for my business database, the "message ingestion" part could block the "business" part sometimes (locks, etc)? Also perhaps when I want to update the schema of my database and not "stop" the inflow of messages, not sure if this would be easy?
2. also that since it would write messages in the queue and then delete them, there would be a lot of GC/Vacuuming to do, compared to my business database which is mostly append-only?
3. and if I split the "Postgres queue" from "Postgres database" as two different processes, of course I have "one less tech to learn", but I still have to get used to pgmq, integrate it, etc, is that really much easier than adding Redpanda?
4. I guess most Postgres queues are also "simple" and don't provide "fanout" for multiple things (eg I want to take one of my IoT message, clean it up, store it in my timescaledb, and also archive it to S3, and also run an alert detector on it, etc)
What would be the recommendation?
There are globally shared resources, but for the most part, locks are held on specific rows or tables. Unrelated transactions generally won't block on each other.
Also running a Very High Availability cluster is non-trivial. It can take a minute to fail over to a replica, and a busy database can take a while to replay the WAL after a reboot before it's functional again. Most people are OK with a couple minutes of downtime for the occasional reboot though.
I think this really depends on your scale. Are you doing <100 messages/second? Definitely stick with postgres. Are you doing >100k messages/second? Think about Kafka/redpanda. If you were comfortable with postgres (or you will be since you are building the rest of your project with it), then you want to stick with postgres longer, but if you are barely using it and would struggle to diagnose an issue, then you won't benefit from consolidating.
Postgres will also be more flexible. Kafka can only do partitions and consumer groups, so if your workload doesn't look like that (e.g. out of order processing), you might be fighting Kafka.
MQTT -> Postgres (+ S3 for archive)
> 1. my "fear" would be that if I use the same Postgres for the queue and for my business database...
This is a feature, not a bug. In this way you can pair the handling of the message with the business data changes which result in the same transaction. This isn't quite "exactly-once" handling, but it's really really close!
> 2. also that since it would write messages in the queue and then delete them, there would be a lot of GC/Vacuuming
Generally it's best practice in this case to never delete messages from a SQL "queue", but toggle them in-place to consumed and periodically archive to a long-term storage table. This provides in-context historical data which can be super helpful when you need to write a script to undo or mitigate bad code which resulted in data corruption.
Alternatively when you need to roll back to a previous state, often this gives you a "poor woman's undo", by restoring a time-stamped backup, copying over messages which arrived since the restoration point, then letting the engine run forwards processing those messages. (This is a simplification of course, not always directly possible, but data recovery is often a matter of mitigations and least-bad choices.)
Basically, saving all your messages provides both efficiency and data recovery optionality.
> 3...
Legit concern, particularly if you're trying to design your service abstraction to match an eventual evolution of data platform.
> 4. don't provide "fanout" for multiple things
What they do provide is running multiple handling of a queue, wherein you might have n handlers (each with its own "handled_at" timestamp column in the DB), and different handles run at different priorities. This doesn't allow for workflows (ie a cleanup step) but does allow different processes to run on the same queue with different privileges or priorities. So the slow process (archive?) could run opportunistically or in batches, where time-sensitive issues (alerts, outlier detection, etc) can always run instantly. Or archiving can be done by a process which lacks access to any user data to algorithmically enforce PCI boundaries. Etc.
Is this some scripting to automate your home, or are you trying to build some multi-tenant thing that you can sell?
If it's just scripting to automate your home, then you could probably get away with a single server and on-disk/in-memory queuing, maybe even sqlite, etc. Or you could use it as an opportunity to learn those technologies, but you don't really need them in your pipeline.
It's amazing how much performance you can get as long as the problem can fit onto a single node's RAM/SSD.
n) Do you really need S3? is it cheaper than NFS storage on a compute node with a large disk?
There are many cases where S3 is absolutely cheaper though.
You can run into issues with scheduled queues (e.g. run this job in 5 minutes) since the tables will be bigger, you need an index, and you will create the garbage in the index at the point you are querying (jobs to run now). This is a spectacularly bad pattern for postgres at high volume.
Doesn't PostgreSQL have transactional schema updates as a key feature? AIUI, you shouldn't be having any data loss as a result of such changes. It's also common to use views in order to simplify the management of such updates.
I'm a java dev and maybe my projects are about big integrations, but I've always needed queue like constructs and polling from a db was almost always a headache, especially with multiple consumers and publishers.
Sure it can be done, and in many projects we do have cron-jobs on different pods -- not a global k8s cron-job, but legacy cron jobs and it works fine.
Kafka does not YET support real queue (but I'm sure there's a high profile KIP to have true queue like behavior, per consumer group, with individual commits), and does not support server side filtering.
But consumer groups and partitions have been such a blessing for me, it's very hard to overstate how useful they are with managing stateful apps.
But then distributed queue is most likely not needed until you hit really humongous scale.
Periodic polling is awkward on both sides: you add arbitrary latency _and_ increase database load proportional to the number of interested clients.
Events, and ideally coalesced events, serve the same purpose as interrupts in a uniprocess (versus distributed) system, even if you don't want a proper queue. This at least lets you know _when_ to poll and lets you set and adjust policy on when / how much your software should give a shit at any given time.
The limiting factor for most workloads will probably be the number of connections, and the read/write mix. When you get into hundreds or thousands of pollers and writing many things to the queue per second Postgres is going to lose its luster for sure.
But in my experience with small/medium companies, a lot of workloads fit very very comfortably into what Postgres can handle easily.
The publisher has a set of tables (topics and partitions) of events, ordered and with each event having an assigned event sequence number.
Publisher stores no state for consumers in any way.
Instead, each consumer keeps a cursor (a variable holding an event sequence number) indicating how far it has read for each event log table it is reading.
Consumer can then advance (or rewind) its own cursor in whatever way it wishes. The publisher is oblivious to any consumer side state.
This is the fundamental piece of how event log publishing works (as opposed to queues which is something else entirely; and the article talks about both usecases).
Then you just query from event_receiver_svcX side, for events published > datetime and event_receiver_svcX = FALSE. Once read set to TRUE.
To mitigate too many active connections have a polling / backoff strategy and place a proxy infront of the actual database to proactively throttle where needed.
But event table:
| event_id | event_msg_src | event_msg | event_msg_published | event_receiver_svc1 | event_receiver_svc2 | event_receiver_svc3 |
|----------|---------------|---------------------|---------------------|---------------------|---------------------|---------------------|
| evt01 | svc1 | json_message_format | datetime | TRUE | TRUE | FALSE |
That sounds distributed to me, even if it wires different tech together to make it happen. Is there something about load balancing REST requests to different DB nodes that is less complicated than Kafka?
To be clear I wasn't talking about DB nodes, I was talking about skipping an explicit queue altogether.
But let's say you were asking about load balancing REST requests to different backend servers:
Yes, in the sense that "load balanced REST microservice with retry logic" is such a common pattern that is better understood by SWE's and SRE's everywhere.
No, in the sense that if you really did just need a distributed queue then your life would be simpler reusing a battle-tested implementation instead of reinventing that wheel.
Kafka has its own foibles and isn't a trivia set-it-and-forget it to run at scale.
The Pareto principle is not some guarantee applicable to everything and anything saying that any X will handle 80% of some other thing's use cases with 20% the effort.
One can see how irrelevant its invocation is if we reverse: does Kafka also handle 80% of what Postgres does with 20% the effort? If not, what makes Postgres especially the "Pareto 80%" one in this comparison? Did Vilfredo Pareto had Postgres specifically in mind when forming the principle?
Pareto principle concerns situations where power-law distributions emerge. Not arbitrary server software comparisons.
Just say Postgres covers a lot of use cases people mindlessly go to shiny new software for that they don't really need, and is more battled tested, mature, and widely supported.
The Pareto principle is a red herring.
>The Pareto principle is not some guarantee applicable to everything and anything
Yes, obviously. The author doesn't say otherwise. There are obviously many ways of distributing things.
>One can see how irrelevant its invocation is if we reverse: does Kafka also handle 80% of what Postgres does with 20% the effort?
No
>If not, what makes Postgres especially the "Pareto 80%" one in this comparison?
Because its simpler.
What implies everything can handle 80% of use cases with 20% of effort? It's like saying:
If normal distributions are real, and human height is normally distributed, then why isnt personal wealth? They are just different distributions.
Let me explain then...
> Yes, obviously. The author doesn't say otherwise.
Someone doesn't need to spell something out explicitly to imply it, or to fall to the kind of mistake I described.
While the author might not say otherwise, they do invoke the Pareto principle out of context, as if it's some readily applicable theorem.
>>If not, what makes Postgres especially the "Pareto 80%" one in this comparison? > Because its simpler.
Something being simpler than another thing doesn't make it a Pareto "80%" thing.
Yeah, I know you don't say this is always the case explicitly. But, like with the OP, your answer uses this as if it's an argument in favor of smething being the "Pareto 80%" thing.
It just makes it simpler. In the initial Pareto formulation is was about wealth accumulation even, which has nothing to do with simplicity or features or even with comparing different classes of things (both sides referred to the same thing, people. Specifically about 20% of the population owning 80% of the land).
For very simple software, most users use all the features. For very specialized software, there's very few users, and they use all the features.
> The claim is that it handles 80%+ of their use cases with 20% of the development effort. (Pareto Principle)
This is different units entirely! Development effort? How is this the Pareto Principle at all?
(To the GP's point, would "ls" cover 80% of the use cases of "cut" with 20% of the effort? Or would MS Word cover 80% of the use cases of postgresql with 20% of the effort? Because the scientific Pareto Principle tells us so?)
Hey, it's really not important, just an idea that with Postgres you can cover a lot of use cases with a lot less effort than configuring/maintaining a Kafka cluster on the side, and that's plausible. It's just that some "nerds" who care about being "technically correct" object to using the term "pareto principle" to sound scientific here, that bit is just nonsense.
First, it would be inverse, not reverse.
Second, no it doesn't work that way, that's the point of the Pareto principle in the first place, what is 80% is always 80% and what is 20% is always 20%.
I know, since that's the whole point I was making. That the OP picked an arbitrary side to give the 80%, and that one could just as well pick the other one, and that you need actual arguments (and some kind of actual measurable distribution) to support one or the other being the 80% (that is, merely invoking the Pareto principle is not an argument).
I've been a happy Postgres user for several decades. Postgres can do a lot! But like anything, don't rely on maxims to do your engineering for you.
If you rent a cloud DB then it can scale elastically which can make this cheaper than Postgres, believe it or not. Cloud databases are sold at the price the market will bear not the cost of inputs+margin, so you can end up paying for Postgres as much as you would for an Oracle DB whilst getting far fewer features and less scalability.
Source: recently joined the DB team at Oracle, was surprised to learn how much it can do.
Wouldn't OrioleDB solve that issue though?
Postgres isn’t meant to be a guaranteed permanent replacement.
It’s a common starting point for a simpler stack which can retain a greater deal of flexibility out of the box and increased velocity.
Starting with Postgres lets the bottlenecks reveal themselves, and then optimize from there.
Maybe a tweak to Postgres or resources, or consider a jump to Kafka.
It often doesn't.
[1] https://rubyonrails.org/2024/11/7/rails-8-no-paas-required
Also, LISTEN/NOTIFY do not scale, and they introduce locks in areas you aren't expecting - https://news.ycombinator.com/item?id=44490510
But anytime you treat a database, or a queue, like a black box dumpster, problems will ensue.
Of course the other 99% is the remaining 1%.
And the thing is, a server from 10 years ago running postgres (with a backup) is enough for most applications to handle thousands of simultaneous users. Without even going into the kinds of optimization you are talking about. Adding ops complexity for the sake of scale on the exploratory phase of a product is a really bad idea when there's an alternative out there that can carry you until you have fit some market. (And for some markets, that's enough forever.)
When you’re doing hundreds or thousands of transactions to begin with it doesn’t really impact as much out of the gate.
Of course there will be someone who will pull out something that won’t work but such examples can likely be found for anything.
We don’t need to fear simplification, it is easy to complicate later when the actual complexities reveal themselves.
Naive approach with sequence (or serial type which uses sequence automatically) does not work. Transaction "one" gets number "123", transaction "two" gets number "124". Transaction "two" commits, now table contains "122", "124" rows and readers can start to process it. Then transaction "one" commits with its "123" number, but readers already past "124". And transaction "one" might never commit for various reasons (e.g. client just got power cut), so just waiting for "123" forever does not cut it.
Notifications can help with this approach, but then you can't restart old readers (and you don't need monotonic numbers at all).
https://www.oreilly.com/library/view/designing-data-intensiv...
You can generate distributed monotonic number sequences with a Lamport Clock.
https://en.wikipedia.org/wiki/Lamport_timestamp
The wikipedia entry doesn't describe it as well as that book does.
It's not the end of the puzzle for distributed systems, but it gets you a long way there.
See also Vector clocks. https://en.wikipedia.org/wiki/Vector_clock
Edit: I've found these slides, which are a good primer for solving the issue, page 70 onwards "logical time":
https://ia904606.us.archive.org/32/items/distributed-systems...
Another way to speed it up is to grab unique numbers in batches instead of just getting them one at a time. No idea why you want your numbers to be in absolute sequence. That's hard in a distributed system. Probably best to relax that constraint and find some other way to track individual pieces of data. Or even better, find a way so you don't have to track individual rows in a distributed system.
If you would rather have readers waiting and parallel writers there is a more complex scheme here: https://blog.sequinstream.com/postgres-sequences-can-commit-...
In a sense this is what Kafka IS architecturally: The component that assigns event sequence numbers.
Another approach which I used in the past was to assign sequence numbers after committing. Basically a separate process periodically scans the set of un-sequenced rows, applies any application defined ordering constraints, and writes in SNs to them. This can be surprisingly fast, like tens of thousands of rows per second. In my case, the ordering constraints were simple, basically that for a given key, increasing versions get increasing SNs. But I think you could have more complex constraints, although it might get tricky with batch boundaries
Isn't it a bit of a white whale thing that a umion can solve all one's subscriber problems? Afaik even with kafka this isn't completely watertight.
So it is a much more serious issue at stake here than event ordering/consistency.
As it happens, if you use event log tables in SQL "the Kafka way" you actually get guarantee on event ordering too as a side effect, but that is not the primary goal.
More detailed description of problem:
https://github.com/vippsas/mssql-changefeed/blob/main/MOTIVA...
A downside is the lack of tooling client side. For many using Kafka is worth it simply for the tooling in libraries consumer side.
If you just want to write an event handler function there is a lot of boilerplate to manage around it. (Persisting read cursors etc)
We introduced a company standard for one service pulling events from another service that fit well together with events stored in SQL.
https://github.com/vippsas/feedapi-spec
Nowhere close to Kafka's maturity in client side tooling but it is an approach for how a library stack could be built on top making this convenient and have the same library toolset support many storage engines. (On the server/storage side, Postgres is of course as mature as Kafka...)
If you don't need what kafka offers, don't use it. But don't pretend you're on to something with your custom 5k msg/s PG setup.
https://www.youtube.com/watch?v=7CdM1WcuoLc
Getting even less than that throughput on 3x c7i.24xlarge — a total of 288 vCPUs – is bafflingly wasteful.
Just because you can do something with Postgres doesn't mean you should.
> 1. One camp chases buzzwords.
> 2. The other camp chases common sense
In this case, is "Postgres" just being used as a buzzword?
[Disclosure: I work for Redpanda; we provide a Kafka-compatible service.]
Kafka is a full on steaming solution.
Postgres isn’t a buzzword. It can be a capable placeholder until it’s outgrown. One can arrive at Kafka with a more informed run history from Postgres.
A colleague and I (mostly him, but on my advice) worked up a set of patches to accept and emit JSON and YAML in the CLI tool. Our use case at the time was setting things up with a config management system using the already built tool RedPanda provides without dealing with unstructured text.
We got a lot of good use out of RedPanda at that org. We’ve both moved on to a new employer, though, and the “no offering RedPanda as a service” spooked the company away from trying it without paying for the commercial package. Y’all assured a couple of us that our use case didn’t count as that, but upper management and legal opted to go with Kafka just in case.
AKA "Medium Data" ?
It's very common to start adding more and more infra for use cases that, while technically can be better cover with new stuff, it can be served by already existing infrastructure, at least until you have proof that you need to grow it.
This is literally the point the author is making.
In this case, you do just need a single fuel truck. That's what it was built for. Avoiding using a design-for-purpose tool to achieve the same result actually is wasteful. You don't need 288 cores to achieve 243,000 messages/second. You can do that kind of throughput with a Kafka-compatible service on a laptop.
[Disclosure: I work for Redpanda]
> This is literally the point the author is making.
Exactly! I just don't understand why HN invariably always tends to bubble up the most dismissive comments to the top that don't even engage with the actual subject matter of the article!
CPU is more tricky but I’m sure it can be shown somehow
It's a fair point that if you already have a pgsql setup, and only need a few messages here and there, then pg is fine. But yeah, the 96 vcpu setup is absurd.
Is anyone actually reading the full article, or just reacting to the first unimpressive numbers you can find and then jumping on the first dismissive comment you can find here?
Benchmarking Kafka isn't the point here. The author isn't claiming that Postgres outperforms Kafka. The argument is that Postgres can handle modest messaging workloads well enough for teams that don't want the operational complexity of running Kafka.
Yes, the throughput is astoundingly low for such a powerful CPU but that's precisely the point. Now you know how well or how bad Postgres performs on a beefy machine. You don't always need Kafka-level scale. The takeaway is that Postgres can be a practical choice if you already have it in place.
So rather than dismissing it over the first unimpressive number you find, maybe respond to that actual matter of TFA. Where's the line where Postgres stops being "good enough"? That'll be something nice to talk about.
Or they could have not mentioned kafka at all and just demonstrated their pub/sub implementation with PG. They could have not tried to make it about the buzzword resume driven engineering people vs. common sense folks such as himself.
I might well be talking out of my arse but if you're going to implement pub/sub in Postgres, it'd be worth designing around its strengths and going back to basics on event sourcing.
Never used Kafka myself, but we extensively use Redis queues with some scripts to ensure persistency, and we hit throughputs much higher than those in equivalent prod machines.
Same for Redis pubsubs, but those are just standard non-persistent pubsubs, so maybe that gives it an upper edge.
zeek -> kafka -> logstash -> elastic
There's poles.
1. Is folks constantly adopting the new tech, whatever the motivation, and 2. I learned a thing and shall never learn anything else, ever.
Of course nobody exists actually on either pole, but the closer you are to either, the less pragmatic you are likely to be.
I think it's still just 2 poles. However, I probably shouldn't have prescribed motivation to latter pole, as I purposely did not with the former.
Pole 2 is simply never adopt anything new ever, for whatever the motivation.
In my company most of our topics need to be consumed by more than one application/team, so this feature is a must have. Also, the ability to move the offset backwards or forwards programmatically has been a life saver many times.
Does Postgres support this functionality for their queues?
So if you want an individual offset, then yes, the consumer could just maintain their own… however, if you want a group’s offset, you have to do something else.
Is a queuing system baked into Postgres? Or there client libraries that make it look like one?
And do these abstractions allow for arbitrarily moving the offset for each consumer independently?
If you're writing your own queuing system using pg for persistence obviously you can architect it however you want.
I don't know what kind of native support PG has for queue management, the assumption here is that a basic "kill the task as you see it" is usually good enough and the simplicity of writing and running a script far outweighs the development, infrastructure and devops costs of Kafka.
But obviously, whether you need stuff to happen in 15 seconds instead of 5 minutes, or 5 minutes instead of an hour is a business decision, along with understanding the growth pattern of the workload you happen to have.
Here is one: https://pgmq.github.io/pgmq/
Some others: https://github.com/dhamaniasad/awesome-postgres
Most of my professional life I have considered Postgres folks to be pretty smart… while I by chance happened to go with MySQL and it became the rdbms I thought in by default.
Heavily learning about Postgres recently has been okay, not much different than learning the tweaks for msssl, oracle or others. Just have to be willing to slow down a little for a bit and enjoy it instead of expecting to thrush thru everything.