Poke it with a stick. That’s what most of us do when we come across something new that we don’t understand, right? I usually start performance tuning in pretty much that same way. Just try some stuff and see what happens.
Wait! What about non-functional performance requirements? What about tracking and replicating real user behavior?
Well, all that stuff is important, but not for performance tuning. The thing that matters most is to take reliable, repeatable measurements with each change. These are the two metrics to look for:
Baseline performance. This is the response time of each key pages under minimal load. It’s important because it’s the very best you can expect your users to get, even under ideal circumstances.
Performance degradation. You want to know the point at which things start to fall apart. How many users/transactions at a time can the app handle before things start to go south? This is closely related to scalability and usually indicates a bottleneck.
Pretty simple, right? The reason I like to keep it that simple is so I can take a methodical approach to tuning later. Once I have these two points, I can easily repeat the exact same test with each tuning change, to see if things get better or worse.
The key is repeatability… being able to take the exact same measurements over and over so we know the impact of our changes.
Let’s say I just installed (or built) a new online storefront. It’s a typical LAMP stack: deployed on Apache, written in PHP and uses a MySQL database. Since I’m selling vintage Watchimals I know the store will be extremely popular and it needs to perform and scale well.
First, I create some test scripts that mimic typical user behavior. Lots of window shoppers are browsing the goods, and a few of them are adding to their carts and checking out. It might take 2-3 scripts to put the site through its paces, so that the test is reasonably representative of actual user behavior.
Next, I configure a load test that starts out with a single user and gradually ramps up to something high enough to cause a slowdown, which for a small site like this might be 500 v-users. Since I’ll be re-running this test over and over, I normally want it to ramp up reasonably quickly: maybe 5-10 minutes or so.
The first time I run my test, I keep track of those same two metrics: the average response time per page under minimal load, and also the breaking point at which response times degrade beyond some acceptable limit (~3 seconds at worst). Of course, I’ll also hold onto the charts and test reports from each successive test run.
Initially, I see that my performance is breaking down around 150 concurrent users. Not too shabby for a vintage Watchimals site, but I think it could be a lot better on the hardware we’re running. So I come up with a hypothesis that there’s a certain slow database query bogging things down. I tweak the query or add an index, and re-run my same test. Now it’s breaking down at 225 concurrent users instead!
Repeat the test with each successive tweak to the site. As long as I have the discipline to just make a single change at a time, I can quantify the performance and scalability impact of that change.
Without a way to generate consistently repeatable load and compare measurements, I’d be tuning in the dark.