First we need to agree on what "Event Sourcing" means. In my view you don't need...

jacques_chester · on Feb 4, 2019

Kafka-oriented streaming folks talk about stream-table duality; the idea that one form can be expressed as the other. There is usually a little lip service paid to this idea before some heavy hints that actually, the stream is the true reality are dropped.

My own view is that there are dimensions for any data of interest, expressing some ability to show an evolution of it. Frequently that dimension is time, or can be mapped onto time.

But neither the stream nor the table is the truest representation. The truest representation is whatever representation makes sense for the problem. Sometimes, I want to clone a git repo. Sometimes I want to see a diff. Sometimes I want to query a table. Sometimes I want a change data capture stream. Sometimes I want to upload a file. Sometimes I want a websocket sending clicks. Sometimes you need a freight truck. Sometimes you need a conveyor belt. Sometimes a photo. Sometimes a movie.

Sometimes I talk about space vs time using the equations for acceleration, or for velocity, or for distance. These are all reachable from each other via the "duality" of calculus, but none of them is the One Truest Formula.

And so it is for data. The representation that makes the most sense for that domain under those constraints is the one that is "truest".

justjico · on Feb 4, 2019

I believe this is where CQRS plays nice with “event sourcing”. You write all your events in one model, but can read them in multiple ones, that is if you can tolerate some read latency... and most systems are usually ok with that.

bonesss · on Feb 4, 2019

CQRS also alleviates a lot of the pain people experience around event sourcing and distributed systems with event evolution and manipulation. There are many cases where it's inappropriate for consuming services to be aware of the internal representation of events. Giving them a 'snapshot-centric' view of the data can be a simplification in both directions.

vidarh · on Feb 4, 2019

I agree with this a lot. I posted a comment elsewhere in this thread about our use of events, and it boils down to selectively picking the entities we need to be able to reason about past states of, and storing the new states of those, and then deriving views from that state. For many uses we don't even ever need to then explicitly apply state transformations on that to derive a materalized form of the present state - a suitable view is often sufficient. For some we do need to apply transformations into new tables, but we can do that selectively. We still always have the database as a single source of truth, as we're lucky not to need to scale writes beyond that, which simplifies things a lot.

What it gives us is ability to rerun and regression test all reporting at point in time for those data sources we model as events, and ability to re-test all the code that does transformations on that inbound data, because we don't throw it away.

"Our" form of event sourcing is very different from the "cool" form: We don't re-model most internal data changes as events. We only selectively apply it to certain critical changes. A user changing profile data is not critical to us. A partner giving us data we can't recreate without going back to them and telling them a bug messed up our data is. For data that is critical like that, being able to go back and re-create any transformations from the original canonical event is fantastic.

And as long as there is an immutable key for the entity, rather than just for the entity at time t(n), we can reference from non-evented parts of the system to either entity at time t(n) or entity at time t(now()) trivially, depending on need.