The great thing about HN is that it consistently shoves into my face how many, s...

wenc · on Feb 4, 2019

It's become a fairly widely known concept in data engineering circles, expounded upon in Martin Kleppman's Designing Data Intensive Applications book. (buy this book if you want to get up to speed on modern ideas around distributed systems and data architecture)

This became popular as people were trying to figure how to use Kafka as a persisted log store that could be "replayed" into various other databases. This meant that you could potentially stream all the deltas (well, more accurately the operations to create the delta, e.g insert, update, delete) in your data -- through a mechanism called Change-Data-Capture (CDC) [1] -- into a single platform (Kafka) and consistently replicate that data into SQL databases, NoSQL databases, object stores, etc. Because these are deltas, this lets you reconstruct your data at any point in history on any kind of back end database or storage (it’s database agnostic).

Event sourcing to my understanding is a term used among DDD practitioners and Martin Fowler disciples but with a different nuance. This article explains what it is:

http://cqrs.wikidot.com/doc:event-sourcing

[1] Debezium is an open-source CDC tool for common open-source databases. Side note: A valid (but potentially expensive) way of implementing CDC is by defining database triggers in your SQL database.

6t6t6t6 · on Feb 4, 2019

EventSourcing is not a Framework, but a concept.

The idea is to store not the current state of your app, but the transitions (events) that derive into the current state.

Think about how git stores your source code as a series of commits.

In theory it is a beautiful idea; in the real world, it is hard to implement.

ploxiln · on Feb 4, 2019

In fact, git stores a full snapshot of your entire repo with every commit. It does not store diffs from the previous commit. When you do "git show <COMMIT_SHA>" it generates a diff from the parent commit on the fly.

There's a huge optimization though: it uses a content-addressed blob store, where everything is referenced by the sha1 of its contents. So if a file's contents is exactly the same between two commits, it ends up using the same blob. They don't have to be sequential commits, or it could even be two different file paths in the same commit. Git doesn't care - it's a "dumb content tracker". If one character of a file is different, git stores a whole separate copy of the file for it. But every once in a while it packs all blobs into a single file, and compresses the whole thing at once, and the compression can take advantage of blobs which are very similar.

bonesss · on Feb 4, 2019

Gits bi-modal nature is a wonderful representation of a sanely architected Event Sourced system. When needed it can create a delta-by-delta view of the world for processing, but for most things most of the time it shows a file-centric view of the world.

IMO a well-factored event sourced system isn't going to feel 'event sourced' for most integrated components, APIs, and services because it's working predominately with snapshots and materialized views. For compex process logic, or post-facto analysis, the event core keeps all the changes available for processing.

Done right it should feel like a massive win/win. Done wrong and it's going to feel hard in all the wrong places :)

6t6t6t6 · on Feb 4, 2019

Yeah... My comment was an oversimplification, but you get the point. :)

With event sourcing there's also the concept of snapshotting, btw.

mbrock · on Feb 4, 2019

Also directory structures are content-addressed, so if a commit changes nothing in a given directory, the commit data won't duplicate the listing of that directory.

myth2018 · on Feb 4, 2019

Then I've just found out I've worked on a 28 years old event source'd code base. In Clipper. An old loan management software, which had to control how installments changed along their lives.

It worked well.

codebje · on Feb 4, 2019

Basic event sourcing is quite simple to implement. All the bells and whistles people sell alongside event sourcing are hard - whether you do event sourcing or not.

Your typical application presents a user interface based on data in a set of database tables (or equivalent), the user takes some action, the database tables get updated.

The equivalent event-sourced application presents a user interface based on data in a set of database tables (or equivalent), the user takes some action, the outcome of that action is written to one table, the other database tables get updated.

For git, where "or equivalent" is the working copy. You could easily imagine a source code management system similar to git, but without storing history - every commit and pull is a merge resulting in only the working copy, every push replaces the remote working copy with your working copy.

But man, wouldn't it suck to be limited to only understanding the most recent state of your source code…

taurath · on Feb 4, 2019

You need to know beforehand how exactly you’re going to pull out the data, how it’s going to be indexed, updated, etc. You can get nice benefits out of it especially if you’re doing transaction states, but your query times are going to suffer unless you’re caching the end-state. This can make “simple” things take a lot of effort. It’s a really significant departure in terms of effort to deal with your data.

acjohnson55 · on Feb 4, 2019

Here's a talk that largely introduced the concept to me, Turning the Database Inside Out: https://www.confluent.io/blog/turning-the-database-inside-ou...

brianmcc · on Feb 4, 2019

Can second this - if asked to recommend one link/article I'd choose this one too, an excellent intro

koliber · on Feb 4, 2019

It is used in finance systems a lot.

Event-based systems that run on message queues are the backbone of money. These often co-exist with batch systems that process huge files (which also can be thought of as a journal of events to be processed in a batch).

olalonde · on Feb 4, 2019

Things like Redux or Bitcoin are more or less event sourced systems. It's basically the idea of deriving your application state from a series of business events and storing those as a single source of truth. It's very appealing in theory but as the article explains, it's a bit more complicated in practice (e.g. dealing with consistency).

coldtea · on Feb 4, 2019

So somewhat akin to the Command pattern?

jen20 · on Feb 4, 2019

Somewhat but with an important difference: commands may fail. Events may not, since they are facts.

Although in 2019 one has to wonder.

computerex · on Feb 4, 2019

It is more widespread in the enterprise software development space and particularly in the Microsoft/.NET ecosystem.

jokab · on Feb 4, 2019

Greg Young was wondering in one of his talks about Event Sourcing why people try to connect CQRS/Event Sourcing with .NET

the_duke · on Feb 4, 2019

Eventsourcing is really a lose set of concepts, and each application of it will look very different. That's also why there are almost no useful frameworks here.

There are some good talks on Youtube about it.

The concept is used all over though: Redux (js) is basically a lightweight form of eventsourcing ( with redux-saga being the "process managers" mentioned in the post).

rwilson4 · on Feb 4, 2019

Events are the cornerstone of product analytics. If you want to understand what your users are doing on your platform, and to look for opportunities to improve the user experience, events are a big part of that.

meetbryce · on Feb 4, 2019

Events in that context are totally different. Sure there's overlap, but that's mostly a coincidence.

AndrewKemendo · on Feb 4, 2019

I've been working with user event funnels for years, with tools like mixpanel and others but it seems like this is a build state tool for development workflow.

rwilson4 · on Feb 4, 2019

That’s a good point! I’ve typically piggy-backed off these types of systems to do that kind of analysis. But you’re right it is a distinct use case.

hueving · on Feb 4, 2019

The basic idea of event sourcing is to store the actions rather than the end state of the action. This allows actions to interleave and systems calculate the endstate from all of the actions.

Think of ATMs. They don't update the balance of your bank account directly. They just record a debit against it and then the sum of all of your credits and debits is your balance. This avoids it having to have some kind of lock on your account during the transition and even allows significant delays from various transaction sources.

vidarh · on Feb 4, 2019

In my view most of the problems with event sourcing that people run into boils down to overdoing it and modelling large parts of their systems as sets of events instead of maintaining an event log just for those things that falls out naturally as discrete events from the rest of your design.

Having certain critical events, and especially complex ones, logged as events that can be replayed and reconstructed and model transformations as operations over the state of those logs can be invaluable. Building the system around modelling all changes as events is amazingly painful.

Nitramp · on Feb 4, 2019

The first thing ATMs (at least over here) do is check if you have enough balance for your transaction. If you just stored events, you'd need to collapse all the previous delta operations at that point, which would be arbitrarily slow if you never persisted data.

Storing event records vs materializing changes is sort of an old hat trick in databases. People did that in the early 90s for better TPC-C scores. It has its uses, and in some contexts storing deltas can have huge advantages (e.g. bigtable with a log structured file system), but it's no silver bullet.

bonesss · on Feb 4, 2019

Databases that try to only reflect the entire truth as-is are like everyone working from a shared whiteboard.

Databases setup to event source, ie reflect the truth as it happened at each step, are like everyone working from a shared spreadsheet with change tracking.

It's an entire extra dimension that makes reconciliating problems of conflicting actions in disparate systems possible.

solipsism · on Feb 4, 2019

The past few years there's probably one front-page article a week on Event Sourcing / CQRS.

rhizome31 · on Feb 4, 2019

For a fun introduction to this concept and the motivations behind it, you may watch Greg Young's talk https://youtu.be/JHGkaShoyNs

ZenPsycho · on Feb 4, 2019

It seems to be one of those things that you probably don't need. And when you do need it, it becomes obvious that you need it. Specifically, in my research, you really shouldn't use it until it becomes painfully necessary to horizontally scale writes.

vidarh · on Feb 4, 2019

Parts of it can be useful, but you don't need to split out an event bus to get auditing for example. As you say, you can avoid that until/unless you need to scale writes.

In the meantime, you can look for inbound data that naturally correspond to immutable events and apply some of the ideas to that. E.g. that form a user submits? It's reasonably an immutable event. Many of them won't matter to you, because you'll never care to audit it. But some might.

E.g. we have projections of financials being submitted by third parties. Being able to go back and audit how original form submissions relate to changes in other system state is useful, or just being able to re-run old reports after fixing bugs and confirming that the reports show what they should before/after certain events. So instead of just storing the end state, we're increasingly looking to store the original external signals that triggered those changes, and build transformations as views over that event log, and then where we need it only drive transformations to tables we don't event the same way, often with a suitable reference to the source event(s).

It avoids the problems in the article for the most part (some, such as changes in the structure of the events will always be an issue), but gets enough of the benefits to be worth it, because it's only applied to data we have that it genuinely fits (where we have clear, natural event sources, often but not always external submissions of data) where we have a need (whether for complexity reasons or because of external auditing requirements) to be able to get past views of data.

barrkel · on Feb 4, 2019

I think you also need it when scaling reads becomes painful.

Reads that have different patterns, specifically, the kinds of patterns that can't be indexed easily because they need denormalization to generate all the indexed expressions. Or you need to read a time series, a snapshot at a point in time, or the latest version of the data, all from different places under different loads - analytic, machine learning, transactional.

One user needs to read across all the data over all time; another user wants super-fast scrollable access to user-customized sorts of a subset of the latest data. The user-configurability of the sort is what defeats the kinds of indexing you get in a traditional RDBMS. The obvious way to get this is a lambda architecture: have an immutable append-only system of record which contains all the data, and build the other views out of it. It's a small step from there to event sourcing.

Yuioup · on Feb 4, 2019

Stay away from it.