> Reversal is the most straightforward when the event is cast in the form of a difference. An example of this would be "add $10 to Martin's account" as opposed to "set Martin's account to $110". In the former case I can reverse by just subtracting $10, but in the latter case I don't have enough information to recreate the past value of the account.
I've used the event sourcing pattern with great success in the context of an exceptionally complex web application for my day job. The front-end would create an 'event' for every action the user took. This allowed us to seamlessly implement client-side undo/redo functionality with surprisingly little effort, a market leading feature our users really appreciated.
One key insight we had was the necessity of capturing the state change as a difference, as Martin explains. This turned out to be tremendously powerful. We were able to build a variety of other features on top including detailed analytics of how users were using our app, and session replay for training and debugging, among other things.
> ... all the capabilities of reversing events can be done instead by reverting to a past snapshot and replaying the event stream. As a result reversal is never absolutely needed for functionality. However it may make a big difference to efficiency...
When I implemented undo/redo, I found that replaying from snapshots was fast enough. I wasn't working on a database-backed app though.
Implementing reversal is extra work and bug surface, so it should be subject to cost/benefit analysis.
I find fuzz testing works great for things like this. Make a function which, given everything you know about a user's state, randomly generates an action that user could take (and the expected result). Then you can run your user model forwards and backwards through randomly chosen actions. If you support undo, check that if you play any change forward then backward, the resulting state is unchanged.
I usually pair that with a "check" function which just verifies that all the invariants I expected hold true. Eg, check the user has a non-negative balance.
It sounds complex, but you get massive bang for buck from code like this. 100 lines of fuzzing code can happily find a sea of obscure bugs.
Those two concepts are not mutually exclusive. Each change can be idempotent by marking some identifying information about the state it was trying to update.
increase balance by 10 (it is currently 100)
balance is now 110
increase balance by 10 (it is currently 100) <- duplicate event!
event has already been applied
balance is now 110
It can definitely break down at some point (maybe the user really did want to increase by 10 two times, they just clicked twice really fast?). But idempotency and change-based flow can live together.
Event sourcing can be represented as a chronological list of changes, replayed so that any list index represents a state.
How CRDTs work is different in different implementations. Some CRDTs use a timestamp to order all changes, thus making it essentially identical to an event sourced list.
Timestamp based ordering may not actually be the most ideal, especially for low trust environments where the timestamp can be spoofed.
Some apps might not last-write wins. In this case, timestamp largely becomes meaningless because there has to be some other way to resolve conflicts due to multiple clients making changes to the same piece of data. Some apps want one clear winner, other apps might want the two inputs to be merged, others (like git) explicitly ask the developer to resolve the conflict.
In this way CRDTs can diverge significantly from an ordered list of changes.
Basically the source of the debate - to do "proper" event sourcing, do you need to rerun the computations each time you roll back/forward, or is it enough to simply restore state?
I was part of that debate, I remember a rather interesting point of discussion: Is the main operation "apply" or "dedup"
Apply seems to be the common notion of event sourcing: There is a function apply that takes a state an event and yields a new state. Then, starting in an init state and iteratively applying the entire event history, boom, latest state restored.
Dedup has a lot of charm though: Run and rerun your code, if that step of your code is executed for the first time (no corresponding event in the event history) execute the step and store its result as an event in the history, however, if that step of your code is executed for the second, third time (there is a corresponding event in the event history) do not execute the step and return its result from the event in the history. The Haskell Workflow Package (https://hackage.haskell.org/package/Workflow) is a good example
Temporal follows the second approach, so "proper" Event Sourcing? You be the judge :)
This may be how Temporal works, but I imagine the way you ask "is there a corresponding event in the event history" is that you tag every "derived event" in the system with a hash of both the state and code used to create it? So if you change the code, you (eventually?) invalidate all events derived from it?
This of course would be incredibly reliant on the code not being able to access anything except the state, and have no side effects - so if you want to say "if X, fetch Y" you need to have something looking at derived state from the "if X" part and putting the results of "fetch Y" into the event stream, thus causing the state relevant to the code to change, and invalidating the need for something to "fetch Y" a second time? There's something incredibly clean and enticing about this, but also incredibly scary to implement correctly at scale!
Hello. I love Temporal, it is an amazing solution. I have been pushing it pretty strongly at my workplace, but you guys need to make that web interface better. I lose way too many people at the demo stage.
Next year on the Q1 I'll be able to push for a full demo again. Hope we get traction this time. It's great tech, and a pleasure to use.
Can't remember how I first encountered your online presence, but I'm pretty sure I've followed you on Twitter for a while. Didn't realize you had joined Temporal. I've followed Temporal for a while, although I haven't yet had the chance to use it professionally. Still, I'm fairly convinced it's The Future, especially as it becomes obvious how to combine it with UX and data layer (right now, it's a bit unclear to me how much of UX state and permanent storage should live in Temporal). If I were building a green field business with automation at its core, I'd very strongly consider picking Temporal.
That's a technical detail imo, as long as your implementation guarantees the outcomes are identical. The reason some people replay events is that it's a simple and easy way to restore previous state.
Event sourcing is a game changer. Everywhere I have worked has ended up reimplementing it in some form, because it is irresistible to incorporate history into your data model.
My most thorough use of event sourcing was in a real-time art auction system, which combined in-person and online bidding. An operator had to input the actions in the room into the system and bid on behalf of online bidders. Event sourcing allowed us to model what actually happened for each auction lot. It also made it possible to model undos and cancellations in really slick ways.
Command-query response segregation (CQRS) is also a powerful concept, and it combines elegantly with event sourcing. I also find it to be a handy way of thinking of raw and derived data, rather than trying to perfect a normalized data model to serve all purposes.
The "reimplementing" part resonates with me 100%. I've also reimplemented this solution many times and every time it has been a pain in the ass.
I built https://batch.sh specifically to address this - not having to reinvent the wheel for storage, search and replay.
For some cases, storing your events in a pg database is probably good enough - but if you're planning on storing billions of complex records AND fetching a particular group of them every now and then - it'll get rough and you need a more sophisticated storage system.
What storage mechanism did you use? And how did you choose which events to replay?
In our case, we used Postgres. Our event volume was quite small, and we needed strict consistency (i.e. all participants in the auction see the same state). So we stored the log of events for an auction lot as a JSON blob. A new command was processed by taking a row-level lock (SELECT FOR UPDATE) on the lot's row, validating the command, and then persisting the events. Then we'd broadcast the new derived state to all interested parties.
All command processing and state queries required us to read and process the whole event log. But this was fairly cheap, because we're talking maybe a couple dozen events per lot. To optimize, we might have considered serializing core aspects of the derived state, to use as a snapshot. But this wasn't necessary.
Batch looks pretty cool! I'll keep that in mind next time I'm considering reinventing the wheel :)
I'm late to this party but I'll chime in anyway - I love event sourcing. So much so that I actually built a company around it (and got into YC S20) - https://batch.sh
Event sourcing is not easy but the benefits are huge - not only from a software architecture perspective but from a systems perspective - you gain a whole lot of additional platform reliability.
There are roughly 3 pillars to event sourcing that have to be built (most of the time) from scratch - storing events (forever), reading stored events, replaying stored events.
Those are the 3 things I've built at several companies and it is always a huge barrier to someone taking up the pattern. Of course there are more things under the hood, but those are foundational pieces that will make the whole experience much better.
Would love to chat with folks who want to nerd out over this stuff :-)
As a techie-turned-PM , I found this pattern really useful when discussing specs with different stakeholders. Business stakeholders understand their business as a series of events (Customer walked in, customer bought X , customer checked out etc) so the user stories become a lot more relatable to both developers and business folks. I vaguely remember events being discussed in "Domain modelling made functional" , too
Very much agree. I had a memorable conversation with a PM a short time before the pandemic about this very subject. She was enthused about how it would make her work more closely aligned that of developers.
I’m surprised to see positive experiences with event sourcing in the comments, I had the impression that in older posts about event sourcing there were horror stories, and I think even Martin Fowler said in most cases he’d seen it didn’t work. Or maybe he was referring specifically to CQRS, but isn’t that used most of the time along event sourcing?
Honestly, you should take negative comments about technology on HN (and elsewhere) with a pinch of salt. By reading HN, I have learned that the language I earn a living with is unfit for industry use. I also frequently encounter claims that the stuff I do is impossible, etc.
From what I have seen however, industry is much more technologically diverse than people imagine.
Event sourcing is an amazing tool. The failing projects are usually the ones that want to use it without acknowledging that it offers different tradeoffs from more traditional data modeling/storing solutions, and not adapting to meet different ups and downs.
There are also many projects that want to use event sourcing, but actually end up picking up event sourcing, reactive programming and a host of frameworks or even languages to do the second. That's a lot of technical baggage to learn at once, and people making the switch in a production context with tight deadlines and without deep pockets to hire experienced people usually don't fare so well.
Event sourcing is really good for building occasionally connected applications (e.g. phone app). Makes conflict resolution so much easier than, say two relational databases.
Basically it follows the git model - distributed systems are each their own sources of truth, and allows for (relatively) simple syncing of state. When two distributed systems conflict, well... you need to write some kind of merge algorithm that may or may not ask for user input. But at least the accidental complexity is significantly reduced.
I often find myself wishing for some sort of small scale durable log, so that this pattern was easy to implement in a small app backend without standing up something like Kafka.
I recently refactored a small set of services to use events instead of depending directly on eachother. I initially had Kafka in mind, but it would've been an absolute overkill. A "simple Kafka" would definitely be nice for these cases.
I ended up using Redis Streams [0], which was good enough for my small-scale need. We already had Redis in our stack too, so it was a very simple integration.
Hi there, I started a company specifically centered around event sourcing - it's a saas platform for capturing all of your events, gaining the ability to granularly search them and then replay them to whatever destination.
I'd be happy to walk you through the platform - there's no lock-in, since we don't require the use of any SDK's, just run an event-relaying container which will pipe all of your events to our stores.
One big piece is that our platform is message bus agnostic meaning, we are able to hook into any message bus, be it kafka, rabbit, nats, etc.; same goes for replays - we can replay into any destination.
The relayer is open source - https://github.com/batchcorp/plumber - if anything, the relayer can also be used for working with message busses which could improve your dev workflow for reading/writing messages, etc.
Styx was built with this kind of use in mind, among other things. If you’re using Go you can even use the storage engine standalone. It’s pretty robust and very fast (millions of fsynced writes/sec).
I've used the event sourcing pattern with great success in the context of an exceptionally complex web application for my day job. The front-end would create an 'event' for every action the user took. This allowed us to seamlessly implement client-side undo/redo functionality with surprisingly little effort, a market leading feature our users really appreciated.
One key insight we had was the necessity of capturing the state change as a difference, as Martin explains. This turned out to be tremendously powerful. We were able to build a variety of other features on top including detailed analytics of how users were using our app, and session replay for training and debugging, among other things.
> ... all the capabilities of reversing events can be done instead by reverting to a past snapshot and replaying the event stream. As a result reversal is never absolutely needed for functionality. However it may make a big difference to efficiency...
When I implemented undo/redo, I found that replaying from snapshots was fast enough. I wasn't working on a database-backed app though.
Implementing reversal is extra work and bug surface, so it should be subject to cost/benefit analysis.
I usually pair that with a "check" function which just verifies that all the invariants I expected hold true. Eg, check the user has a non-negative balance.
It sounds complex, but you get massive bang for buck from code like this. 100 lines of fuzzing code can happily find a sea of obscure bugs.
How CRDTs work is different in different implementations. Some CRDTs use a timestamp to order all changes, thus making it essentially identical to an event sourced list.
Timestamp based ordering may not actually be the most ideal, especially for low trust environments where the timestamp can be spoofed.
Some apps might not last-write wins. In this case, timestamp largely becomes meaningless because there has to be some other way to resolve conflicts due to multiple clients making changes to the same piece of data. Some apps want one clear winner, other apps might want the two inputs to be merged, others (like git) explicitly ask the developer to resolve the conflict.
In this way CRDTs can diverge significantly from an ordered list of changes.
Dead Comment
Basically the source of the debate - to do "proper" event sourcing, do you need to rerun the computations each time you roll back/forward, or is it enough to simply restore state?
Apply seems to be the common notion of event sourcing: There is a function apply that takes a state an event and yields a new state. Then, starting in an init state and iteratively applying the entire event history, boom, latest state restored.
Dedup has a lot of charm though: Run and rerun your code, if that step of your code is executed for the first time (no corresponding event in the event history) execute the step and store its result as an event in the history, however, if that step of your code is executed for the second, third time (there is a corresponding event in the event history) do not execute the step and return its result from the event in the history. The Haskell Workflow Package (https://hackage.haskell.org/package/Workflow) is a good example
Temporal follows the second approach, so "proper" Event Sourcing? You be the judge :)
This of course would be incredibly reliant on the code not being able to access anything except the state, and have no side effects - so if you want to say "if X, fetch Y" you need to have something looking at derived state from the "if X" part and putting the results of "fetch Y" into the event stream, thus causing the state relevant to the code to change, and invalidating the need for something to "fetch Y" a second time? There's something incredibly clean and enticing about this, but also incredibly scary to implement correctly at scale!
Next year on the Q1 I'll be able to push for a full demo again. Hope we get traction this time. It's great tech, and a pleasure to use.
My most thorough use of event sourcing was in a real-time art auction system, which combined in-person and online bidding. An operator had to input the actions in the room into the system and bid on behalf of online bidders. Event sourcing allowed us to model what actually happened for each auction lot. It also made it possible to model undos and cancellations in really slick ways.
Command-query response segregation (CQRS) is also a powerful concept, and it combines elegantly with event sourcing. I also find it to be a handy way of thinking of raw and derived data, rather than trying to perfect a normalized data model to serve all purposes.
I built https://batch.sh specifically to address this - not having to reinvent the wheel for storage, search and replay.
For some cases, storing your events in a pg database is probably good enough - but if you're planning on storing billions of complex records AND fetching a particular group of them every now and then - it'll get rough and you need a more sophisticated storage system.
What storage mechanism did you use? And how did you choose which events to replay?
All command processing and state queries required us to read and process the whole event log. But this was fairly cheap, because we're talking maybe a couple dozen events per lot. To optimize, we might have considered serializing core aspects of the derived state, to use as a snapshot. But this wasn't necessary.
Batch looks pretty cool! I'll keep that in mind next time I'm considering reinventing the wheel :)
Event sourcing is not easy but the benefits are huge - not only from a software architecture perspective but from a systems perspective - you gain a whole lot of additional platform reliability.
There are roughly 3 pillars to event sourcing that have to be built (most of the time) from scratch - storing events (forever), reading stored events, replaying stored events.
Those are the 3 things I've built at several companies and it is always a huge barrier to someone taking up the pattern. Of course there are more things under the hood, but those are foundational pieces that will make the whole experience much better.
Would love to chat with folks who want to nerd out over this stuff :-)
From what I have seen however, industry is much more technologically diverse than people imagine.
Event sourcing is an amazing tool. The failing projects are usually the ones that want to use it without acknowledging that it offers different tradeoffs from more traditional data modeling/storing solutions, and not adapting to meet different ups and downs.
There are also many projects that want to use event sourcing, but actually end up picking up event sourcing, reactive programming and a host of frameworks or even languages to do the second. That's a lot of technical baggage to learn at once, and people making the switch in a production context with tight deadlines and without deep pockets to hire experienced people usually don't fare so well.
Basically it follows the git model - distributed systems are each their own sources of truth, and allows for (relatively) simple syncing of state. When two distributed systems conflict, well... you need to write some kind of merge algorithm that may or may not ask for user input. But at least the accidental complexity is significantly reduced.
I ended up using Redis Streams [0], which was good enough for my small-scale need. We already had Redis in our stack too, so it was a very simple integration.
[0] https://redis.io/topics/streams-intro
I'd be happy to walk you through the platform - there's no lock-in, since we don't require the use of any SDK's, just run an event-relaying container which will pipe all of your events to our stores.
One big piece is that our platform is message bus agnostic meaning, we are able to hook into any message bus, be it kafka, rabbit, nats, etc.; same goes for replays - we can replay into any destination.
Check it out: https://batch.sh
The relayer is open source - https://github.com/batchcorp/plumber - if anything, the relayer can also be used for working with message busses which could improve your dev workflow for reading/writing messages, etc.
It works pretty well this way, not everything needs to be streamed.
https://github.com/dataptive/styx