When a visitor hits your site – before the content magically appears in their browser – it must pass through multiple application layers. We sometimes loosely define these layers as the “front-end” and the “back-end”. The front-end consists of HTML, CSS, JS, images, etc. The back-end is the infrastructure and, in some cases, databases and application logic, responsible for serving up data to the user.
There has been much evangelism around front-end performance optimization. Tools like Google Page Speed, Web Page Test, and the built-in developer tools in your browser make it easy to see how your site’s front-end performance stacks up. The tools even suggest helpful improvements automatically. Tips like reducing the number of requests, shrinking images and other static content, and offloading static assets to a Content Delivery Network (CDN) are pretty much no-brainers.
Front-end optimizations are the low hanging fruit because they are relatively straightforward to implement. They are so straightforward, even, that a bunch of companies offer fully automated solutions that optimize your site’s front-end performance without writing a line of code.
The ever-popular Pareto principle, or 80 / 20 rule, has been applied to performance optimization. Many have generalized that 80% of the time spent waiting on web pages is on the front-end, and only 20% on the back-end. The implication is that optimizations made to the front end will return the most noticeable improvement with the least amount of effort.
Chances are, you can validate this ratio with your own site fairly easily. Using Chrome Developer Tools or similar, track how long it takes to load the initial page (HTML only), and then track how much longer it takes to load all the other images, CSS, scripts, and other baloney that go along with it.
The ratio may look something like this:
This seems to corroborate the “80 / 20 Rule of Web Performance”, since most of the time is spent fetching, parsing, and rendering front-end resources.
If you think that back-end performance will always be the same, you’re making a dangerous assumption.
Most likely, you are doing your front-end/back-end performance measurements when your site is under a rather small amount of load. At times of low usage, when not many users are hitting the site, it’s a safe bet that your back-end is about as fast as it ever is.
So what happens during a traffic spike? High traffic events can occur at any time and for just about any reason. Ideally, high traffic events mean something good is happening to your business, like a successful marketing campaign or because you got mentioned on a popular news site. But they can also happen because of randomness, stupidity, or even a malicious attack.
Regardless of their cause, high traffic events carry the risk of slowing down your back-end.
Let’s take a look at what can happen to your overall response times, as user load (traffic) increases:
As more concurrent users start visiting the site, the back-end starts to slow down. In this example, under moderate load, the ratio is no longer 80 / 20… more like 60 / 40.
The back-end time is increasing, but the front-end time stays constant, since each user has their own browser and it parses and renders content independently of the others.
In this way, the ratio of time spent in the front-end and back-end may change, depending on the amount of load on a site.
Question: If your backend takes 400ms to return a page when 100 concurrent users are on the site, how long will it take when 200 concurrent users are on the site?
The correct answer is C. It might be tempting to say the backend will take the same 400ms regardless of load. Or one might assume it will take 800ms (twice as long at twice the load). But the truth is, nobody knows ahead of time how long it will take, because the relationship between concurrency and response time is complicated and non-linear.
What we can say with confidence, though, is that at some point, as load increases, a back-end resource will become saturated and become a bottleneck. It could be a database bogging down, or a web or application server hitting its connection limit. It could even be something at the hardware layer, like a network or CPU or disk.
If more visitors continue to arrive after a bottleneck has been reached, requests begin to queue up, and back-end response times will then become much worse. The site could become completely unresponsive.
Once a back-end bottleneck has been hit, generalizations like the “80 / 20 Rule of Web Performance” no longer apply.
That’s why knowing a site’s bottlenecks and breaking points is critical, not just to user experience, but to avoiding crashes and downtime.
Despite the complexity, it is possible to prepare for a high traffic event. You can gain a high degree of knowledge about your site’s breaking points and bottlenecks.
Unlike the front-end, which has relatively well-defined prescriptions for optimization, optimizing a back-end to scale effectively requires trial and error. Even for experienced developers, it’s difficult to predict what the bottleneck will be, or how high the site can scale before reaching its breaking point. And you can’t afford to wait around for your site to crash to discover its bottlenecks; you need to proactively test the boundaries before your users do.
Load testing is the practice of simulating a bunch of users hitting your site. Assuming your load tests are realistic, you’ll be able to use these tests to determine the site’s breaking point. With adequate monitoring, you can observe the bottlenecks during a load test, and in many cases fix them altogether.
There are many load testing tools out there, but of course we’re partial to our own Loadster. It’s a cloud-hybrid load testing solution that helps you easily simulate a large volume of realistic users, to discover bottlenecks and gain confidence your site will scale to meet peak demand.