Quick and Dirty Performance Tuning
Ad Hoc Performance Tuning
Poke it with a stick. That’s what most of us do when we come across something new that we don’t understand, right? It’s okay to start performance tuning in pretty much that same way. Just try some stuff and see what happens.
Wait! What about Service Level Objectives? What about Non-Functional Requirements? What about defining and simulating realistic user behavior?
All that stuff is important, but not for initial performance tuning. The thing that matters most is to take reliable, repeatable measurements before and after each change.
Performance Tuning Metrics
If your goal is simply to tune your application to perform better than it does now, these are the two things you’ll need to measure.
Baseline Performance. This is the response time of each key page or transaction, under minimal load. This is the “as good as it gets” number… since it was measured when the system was under minimal load, it’s as fast as you can expect it to be, under the most ideal circumstances.
Maximum Throughput. As load is added (more and more requests from more and more concurrent users), you’ll want to know at what point things start to fall apart. How much throughput can the app handle before errors start or response times become unacceptably slow?
Pretty simple, right? If I know these metrics before I make a tuning change, I can run the same test again to see if things got better or worse. If the baseline performance got better and/or the maximum throughput increased, the tuning change was a good one. If not, take a step back and try something different.
The key is to have a repeatable automated test, so you can take the exact same measurements before and after each change and measure its impact.
A Simple Application Tuning Example
Let’s say I just rolled out a new ecommerce site. It’s a typical LAMP stack: deployed on Apache, written in PHP and uses a MySQL database. Since I’m selling vintage Watchimals to hipsters, I expect the store will be fairly popular and it needs to perform and scale well.
First, I create some automated test scripts that mimic typical user behavior. Typically, lots of window shoppers are browsing the goods, and a few of them are adding to their carts and checking out. It might take 2-3 scripts to put the site through its paces, so that the test is reasonably representative of actual user behavior.
Next, I configure a load test that starts out with a single user and gradually ramps up to enough users to cause a slowdown, which for a small site like this might be 500 bots. Since I’ll be re-running this test over and over, I normally want it to ramp up reasonably quickly: maybe 5-10 minutes or less.
The first time I run my load test, I keep track of those same two metrics: a Baseline Performance metric (like the average response time per page under minimal load), and also the Maximum Throughput metric (the number of concurrent users at which response times degrade beyond an acceptable level, like 2 seconds). I’ll also hold onto detailed charts and test reports from each test run in case I need to do deeper analysis later.
Initially, I see that my performance is breaking down around 150 concurrent users. Not too shabby for a vintage Watchimals site, but I think it could be a lot better on the high-end cloud instances we’re running. So I come up with a hypothesis that there’s a certain slow database query bogging things down. I tweak my query and add an index, and re-run the exact same load test. Now it’s performing similarly at baseline, but reaching maximum throughput at 225 concurrent users instead.
So my tuning change helped, but now there’s a new bottleneck.
Repeat the test with each successive tweak to the site. As long as I have the discipline to just make a single change at a time, I can quantify the performance and scalability impact of that change, and either keep it if it helped, or roll it back if it didn’t.
Without a repeatable automated load test to take measurements after each change, I’d be tuning in the dark.