June 17, 2016

Hide processing time from the user

What happens in a regular old web application when a user makes a request? This web application could be built in Drupal or Ruby on Rails or Django. These frameworks all, on a very general level, share some characteristics — and not just because they are MVC frameworks.

Dynamic web applications built with these frameworks share the characteristic that they build responses on request. This is so trivial, and such a common pattern, that it is rarely questioned.

They share the pattern that they handle requests in roughly the following steps:

Receive and understand request
Query a database for some information
Transform data into a response
Send response to the user

Receiving requests and sending responses are required for a web server to be considered a web server, so these hardly need further exploration.

Querying a database may take different forms. Even as it has become possible to use NoSQL document stores as databases, most web applications still use relational databases. No matter the type of data store, web applications usually make several distinct queries to it for every request it handles.

The transformation step is a combination of business logic (aggregation and calculation based on the data) and rendering into a view (producing the resulting HTML or JSON or XML or …).

When web applications grow slow, it is because these two steps (querying and transforming) simply aren’t keeping up with the demand. Maybe the queries are sloppily written, or maybe the business logic is slow. Maybe the traffic on the web app has just exceeded the web server’s capabilities, or maybe there are so many requests that the database server is taking a toll. But it always comes down to one of these two types of actions: querying and transformation are the bottlenecks of web applications.

Querying and transformation cannot be avoided altogether, but performing both of these actions on each request, making the client wait for them to complete.

What if the client didn’t have to wait for them at all?

A different way of handling requests could look like this:

Receive and understand request
Find the correct response
Send response to the user

Wait, how does this work? It works like this: every time a change happens to the state of the web application, all the responses that might result from this change are rendered and saved, either on disk or in a quickly accessible data store. Whenever a user makes a request, the correct response is looked up and returned.

The web application now consists of two separate parts: the part that builds responses, and the part that handles requests.

Looking up a response in a data store is a relatively cheap operation, especially compared to several database queries and a transformation.

The user no longer perceives the long processing time the server actually goes through to prepare a response. It is hidden. Done in the background. The server knows exactly what the user wants, before the user asks for it.

More concretely #

As an example of this approach, let’s look at a simple blog. In this blog, you can view a post, create or edit a post (the page for these two actions will be identical in layout), or list all the existing posts on the site.

The page for creating a post will always be the same: it is a single, static view. That’s easy to build: get a request, serve a static response. The page contains a form for a title and some post content. What happens when the user clicks “submit”?

Let’s assume that the underlying data store (the source of truth) is a relational database. The web application will trigger an event every time something changes, and all responses that depend on the thing that changed will automatically be (re)built.

When the server gets a request to create a new post, it will save the data to the database, and then trigger an event that the newly created post has been updated. The event could look something like this:

{
  eventType: "StateUpdated",
  stateType: "BlogPost",
  stateId: 12391
}

The response building service is listening to these StateUpdated events, and it knows to trigger the build of three things when it gets a StateUpdated event for a BlogPost:

Build the response for anyone requesting the blog post
Build the response for anyone requesting the listing of blog posts
Build the response for anyone requesting the edit page for the blog post

So a single change actually triggers three renders. Is this a good trade-off? Well, let’s look at each produced response, and see how often they would be built for a build-on-request blog.

In a Wordpress blog, the most viewed page is likely to be the front page, showing a listing of the most recently created blog posts on the site. If the site is any kind of popular, the front page will probably receive a couple of thousand visits between each change to any blog post. That is a thousand more renders than in the model I just proposed. And these are just the gains for the list view.

A single post is probably viewed some thousand, maybe ten thousand times between edits. But that only goes for the time during which it is edited. On blogs it is very rare that an old post is edited. As time goes on, the number of visits per edits for a single blog post will approach (however slowly) infinity: the visits will keep coming (hopefully!) but the edits will have stopped.

The last view we render when a blog post is created is the editing page. While a blog post is still fresh, the editing page will be viewed just around once per editing. It is relatively rare to go into editing mode without making changes (although now that I think about it, I realize that I do that a lot). This gives us at least a 1:1 relationship between renders and views in the model I proposed… except(!) that the last render will be for naught: at some point, no more edits will happen to a post, and the editing page will have been rendered, never to be used.

So we waste one render. But we do, in all, a factor thousand (or more) fewer renders in total. That’s a pretty big improvement! And the relative improvement grows as the system we are building scales: the more visitors, the higher a relative payoff.

Okay, but #

How does this make sense, though? How can a web application be in any kind of consistent state when using this model?

The simple answer is that the web application will be eventually consistent. But web applications are already inherently eventually consistent: by the time the client receives some data, it may have changed on the server.

Users may get “old” data while the new versions of the responses are being built, but eventually they will catch up. The time for this to happen now depends on the number of changes and dependent responses rather than the number of requests.

More requests will no longer significantly slow down response time. Server load will depend much less on number of requests. The server becomes less fragile.

This means that this approach would compare especially favorably with the usual approach when used on web applications with many requests and few changes: news sites, blogs, etc.

Does this approach apply to every single kind of endpoint? Doesn’t it break down in some cases?

It does not work for every kind of endpoint. There are some cases where build-on-request is a better way to go — but they are few and far between.

For example, searching would be counter-intuitive to implement in this manner.

Some relationships may need to be rethought for this approach to scale sensibly. A very active comment section could, for example, be detached from the main article that the comments are left on, so the article view is not recomputed on every comment. It is already common to load comment sections on web pages asynchronously with Javascript, and this would fit that pattern nicely.

Yet, if the comment section is read more than contributed to, building the page on every comment would be more efficient than on every request.

So the performance will be improved—especially from the perspective of users—but won’t this require a lot of storage space?

Yes. It will. The great thing is that storage space is getting really cheap—much cheaper than letting users have bad experiences with your product.

Of course things can be done to reduce how much space is actually used, and it might even make sense to have the response cache be distributed and sharded. A single query (to some data store) will still be quicker than multiple queries and a transformation.

What I have described is a general idea. The specific implementation and variations on this are yet to be explored. This includes the specifics of data storage and distribution. These are not hard problems to solve, but there may be several ways to go about it, each with its own set of advantages and disadvantages.

What about user state, if the request is from an authorized user, and we want to show their user information?

This is probably one of the strongest arguments for building responses on request: having pre-rendered a uniquely tailored response for any of the thousands of users we have on our site is simply unmanageable!

But you don’t have to. The web is here to help, and it turns out that there are many neat ways to do tailoring like this. A lot of these are in use already, loading some (less-important) data asynchronously, after the core content of the page has been displayed to the user. Comments sections are often loaded like this, if they use Discourse or Facebook comments, or similar third-party services.

I would recommend a very simple trick: when a user logs in, save the user name and other visuals needed in cookies or in LocalStorage in the browser. Leave a tiny bit of inline-javascript in the component on the page that must be personalized, to pull in the name from where it is stored and display it. This will be instant, unperceivable to the user, as long as the code (a couple of lines) is left inline.

I might elaborate on this trick at some later point.

Why aren’t people using this approach already?

Actually, there are plenty of people using very similar architectures, it just isn’t mainstream yet. It isn’t an easily accessible architecture, and right now convenience wins out over scalability in the web framework scene, because it is good enough.

It would be over-engineering to use a less convenient architecture when it is not needed, but I believe that this kind of architecture can be made just as easy as good old MVC architectures are. Maybe it is even compatible with MVC, if brought in to replace the traditional relational database as the data layer.

I believe this approach could be a free (in terms of developer pain) improvement over the existing models, if designed right.

I have been grasping at this idea for quite some time. I wrote about static pages in dynamic web apps almost a year ago. I will continue to elaborate on this idea, and hopefully, one day, something useful will see the light of day.

If you’re interested, have suggestions, or have seen something that I should know about in relation to this, please let me know.

I write stuff on a pretty regular basis. If enough people sign up to my newsletter, I will start letting you know when I write new stuff.

Kudos

Hide processing time from the user

More concretely #

Okay, but #

Now read this

jQuery Rage: Pseudo-Functional Programming