My First Event Sourced Application

Back in 2011, in my first year on the BSc Software Development, I built a link shortening site. The project was pretty much abandoned within half a year and has been lying dormant since.

This is how the main page of the site looked: front page

I have recently started migrating things to a new server, and that brought me to reevaluate the site. It is currently offline, so you will have to rely on the pictures I show here for an idea of it.

One part of the application is showing statistics over who has visited a link. This happens by recording a visit whenever a visitor accesses a link. For example, if you visited the application would record your user agent before sending you off to the destination,

I wanted to show a breakdown of what browser and operating system a user was using to visit the site. When I was creating the data model and the corresponding database table I did not know exactly what statistics I may want to display. I chose to simply store the user agent of each visit, letting me parse the data at a later time.

Here’s a look into how I stored a link_visit:

Database containing user agents of visits.

This data let me later add some parsing of the user agents to find operating systems and browsers used for the visits. The resulting page looks like this:

Statistics for a link

What I realized just now, looking through the application again, is that I was using a primitive kind of event sourcing, simply because it made sense and not because I knew of any of the advantage of doing so (other than deferring understanding of an event).

Event sourcing is the idea of using a stream of events (things that happen in your system) as the single source of truth for your application.

This means saving things that happen instead of the current state.

In the case of drngd an event would be someone visiting a link.

If the system used current state as the source of truth, we might have had a model for a link with these fields:

In addition, I would need a table of visits_for_day to display the graph of daily visits. It would have these fields:

This data model would contain exactly the data displayed on the statistics page. The problem is that it would be very hard to extend.

Imagine I wanted to make it possible to select a date range and only show statistics for that particular period of time. This would require for me to create a new data model and would not have backwards compatibility.

Imagine a much simpler case in which I want to add a new OS category. I cannot do this with backwards compatibility.

This is the advantage that my accidental event sourcing design has: I can add new metrics, as long as it is information I can parse from a user agent.

With event sourcing, the state displayed is conceptually built by a projection. A projection is a bit of code that runs through a certain set of events, producing a state that would look like the one I presented above.

The projection may later be changed and re-run, resulting in a state with more information all the while remaining backwards compatible.

Think of a projection as projecting events (with depth) into a (flat) state.

In drngd the projection code runs on request, cycling through all of the link visits for a particular link. This does not perform well enough to be a feasible solution for a system at scale. If it had to loop through just a hundred thousand link visits it would be felt by the client: the server would be slow to respond. (Good thing drngd never really got popular.)

For this reason, projections in event sourcing are usually run whenever an event happens, so the state is always built and ready to be displayed. This gives a great performance boost, but requires some kind of caching layer—but a caching layer can just be a different table in your database.

This is an example of building on change rather than on request. I covered this in a different context in the blog post Static Pages in Dynamic Web Apps.

This has been a very simple example of (accidental) event sourcing, but the concept is widely appliccable. One advantage is that it can often co-exist with state-as-single-source-of-truth data models.

Can you think of something in the application you are currently building that may be improved by using event sourcing?

Get notified when I write more.


Now read this

On Versioning

In a recent outburst on Twitter and Github, Jeremy Ashkensas and David Heinemeier Hansson argue for Romantic Versioning (versioning with heart) rather than Semantic Versioning (functionally meaningful versioning). I argue that the debate... Continue →