Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

First we need to agree on what "Event Sourcing" means. In my view you don't need to implement every single Event Sourcing pattern to have an "Event Sourced" system. Say you have a TODO List app (yes, pretty cliche but that's ok). In that TODO list you have a <form> in which you post the state of the TODO list to the server. That state is stored in the database in the form of an event "TODO_LIST_SAVED". When you want to "replay" back, just list all the events from the DB in chronological order while filtering the ones the user has access and pick the last, then rebuild the HTML using that one event converted into the view model.

Kaboom, you have an event sourced system that doesn't even use a queue.

The problem with trashing the idea is that people have a bad experience with it, either by over-engineering and trying to apply all the solutions with tools or hand-made implementations instead of using a Lean approach, or storing events in a type of business model that doesn't even require a database.

"Event Sourcing is Hard" is a statement as true as "web development" is hard, or "distributed systems" is hard, or "API" is hard, "eventual consistency" is hard... yet, we build those things every day doing the best we can. In fact, anything, any practice, any technique, any architecture can be hard because software is hard. Even harder is to not over-engineer something that can be very simple.

Simplicity is hard.



Kafka-oriented streaming folks talk about stream-table duality; the idea that one form can be expressed as the other. There is usually a little lip service paid to this idea before some heavy hints that actually, the stream is the true reality are dropped.

My own view is that there are dimensions for any data of interest, expressing some ability to show an evolution of it. Frequently that dimension is time, or can be mapped onto time.

But neither the stream nor the table is the truest representation. The truest representation is whatever representation makes sense for the problem. Sometimes, I want to clone a git repo. Sometimes I want to see a diff. Sometimes I want to query a table. Sometimes I want a change data capture stream. Sometimes I want to upload a file. Sometimes I want a websocket sending clicks. Sometimes you need a freight truck. Sometimes you need a conveyor belt. Sometimes a photo. Sometimes a movie.

Sometimes I talk about space vs time using the equations for acceleration, or for velocity, or for distance. These are all reachable from each other via the "duality" of calculus, but none of them is the One Truest Formula.

And so it is for data. The representation that makes the most sense for that domain under those constraints is the one that is "truest".


I believe this is where CQRS plays nice with “event sourcing”. You write all your events in one model, but can read them in multiple ones, that is if you can tolerate some read latency... and most systems are usually ok with that.


CQRS also alleviates a lot of the pain people experience around event sourcing and distributed systems with event evolution and manipulation. There are many cases where it's inappropriate for consuming services to be aware of the internal representation of events. Giving them a 'snapshot-centric' view of the data can be a simplification in both directions.


I agree with this a lot. I posted a comment elsewhere in this thread about our use of events, and it boils down to selectively picking the entities we need to be able to reason about past states of, and storing the new states of those, and then deriving views from that state. For many uses we don't even ever need to then explicitly apply state transformations on that to derive a materalized form of the present state - a suitable view is often sufficient. For some we do need to apply transformations into new tables, but we can do that selectively. We still always have the database as a single source of truth, as we're lucky not to need to scale writes beyond that, which simplifies things a lot.

What it gives us is ability to rerun and regression test all reporting at point in time for those data sources we model as events, and ability to re-test all the code that does transformations on that inbound data, because we don't throw it away.

"Our" form of event sourcing is very different from the "cool" form: We don't re-model most internal data changes as events. We only selectively apply it to certain critical changes. A user changing profile data is not critical to us. A partner giving us data we can't recreate without going back to them and telling them a bug messed up our data is. For data that is critical like that, being able to go back and re-create any transformations from the original canonical event is fantastic.

And as long as there is an immutable key for the entity, rather than just for the entity at time t(n), we can reference from non-evented parts of the system to either entity at time t(n) or entity at time t(now()) trivially, depending on need.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: