Analyzing Test Results

After you finish running a test, Loadster automatically generates a test report.

Many of the sections in the report correspond to the ones you saw in the dashboard when you were running your test.

Load Test KPIs

The key performance indicators at the top of every Loadster report are:

  • Duration - The entire duration of the test from start to finish.
  • Bots - The peak number of bots running at any time across all groups.
  • Download - The total bytes downloaded (HTTP response headers + bodies).
  • Upload - The total bytes uploaded (HTTP request headers + bodies).
  • Total Pages - The total number of top-level “pages” successfully requested.
  • Total Hits - The total number of top-level “pages” plus any included page resources successfully requested.
  • Total Errors - The total number of HTTP and validation errors.
  • Total Iterations - The total number of script iterations completed between all bots from all groups.

More information about each of these can be found in the report sections that follow.

Notes

The notes section is reserved for you to edit yourself. We suggest writing a paragraph or two about the assumptions that went into the test (user behavior, traffic patterns, hypotheses, etc) as well as a high-level summary of whether or not the site performed acceptably.

Played Scripts

Since you might have edited your test scenario or scripts after running this test, this section keeps a record of exactly what scripts were run, and what version. Clicking on a script brings up a read-only view of what the script looked like when you ran this test.

Planned Scenario

The section shows a summary of the test configuration, with details about each of the bot groups in the test. This is important if you need to compare tests to see if there were differences in the configuration that may have contributed to different outcomes.

Response Times

The Average Load Time by URL graph is useful so you can see which of your pages/URLs are slower. In all but the simplest sites, certain pages tend to account for the bulk of the slowness. Slow pages are often your best optimization candidates. The term “page” is used loosely here and can also refer to an endpoint or anything else represented by a URL.

The Response Time Percentiles graph shows a rolling aggregate of response time percentiles across all the URLs in your test. These are broken down by the 99th, 95th, 90th, 80th, and 50th (median) percentile. If these lines are close together, it suggests that response times are quite consistent; if they are far apart, it means that outliers are much slower than the median.

The Total Time Spent table shows an aggregated total of response times broken down by URL.

This is a good way to find which URLs are contributing the most to slowness. URLs that are hit frequently and are slow will rise to the top of this table, and are often the best candidates for optimization.

Network

The Network Throughput graph shows the rate of bytes and bits transferred per second throughout your test. Although Loadster mostly measures HTTP/HTTPS throughput (at the application layer), this should be a close approximation of actual throughput at the transport layer as well.

The Cumulative Network Throughput graph is the total number of bytes uploaded (requests) and downloaded (responses) in your test. Since the number reported is cumulative, it will climb throughout the test, especially during the peak load phase.

Transactions

The Transaction Throughput graph is the rate of pages and hits per second over time.

The Transactions graph shows a cumulative count of the pages, hits, iterations, and errors in the test.

The Running Bots by Group graph shows, for each bot group, how many bots have been running at any point during the test. The ramp-up and ramp-down phase should resemble what you configured in your scenario. Bots may take a bit longer than planned to exit during the ramp-down phase, because they must complete the current iteration of their script before exiting.

Errors

The Errors by Type graph shows a count of errors broken down by the error message. It’s useful for seeing when the errors happened in the course of your load test. If a large spike of errors happens all at one moment, that may hint at a different underlying cause than errors spread evenly throughout the test.

The errors that show up here may include HTTP errors (any response with an HTTP 4xx or 5xx status), validation errors (which are thrown when a step fails one of your validation rules), or network errors such as socket timeouts or connection failures.

The Errors by URL graph shows a count of errors broken down by what URL they occurred on.

Traces

The Traces section provides more details on certain transactions.

Traces of type INFO are typically taken by the first bot in each group, and are useful as a sampling of requests regardless of whether the request was successful or not. They provide some of the same information you might get when playing a script in the editor, but only for certain bots, since it wouldn’t be feasible to capture all this information for every bot in a load test.

Traces of type ERROR are taken automatically when a bot experiences an error. For Browser Bots, these might include a screenshot of what was in the browser when the error happened.

The number of traces taken per test is limited, so if you are running a large test with many errors or many iterations, there’s no guarantee that every possible thing will be traced. Detailed traces are available during the test and for a few days afterwards.

System Statistics

The Load Engine CPU Utilization graph shows how busy the CPU(s) are on each load engine or cluster. If the CPU remains 100% utilized for a significant amount of time, it can result in inaccurate response time measurements! If this happens, it may be a good idea to split the bot group into multiple smaller groups on different engines or clusters.

The Load Engine Memory Utilization graph tells how well the engine is managing its memory. This is rarely a problem, but things to look out for include very high memory usage (close to 100%) and extremely frequent garbage collection (lots of big spikes and drop-offs in the chart).

The Load Engine Thread Count graph is another measurement of how busy the load engine or cluster is. The thread count is directly correlated to how many bots the engine is running. Engines will always have at least one thread per bot, and more if the script calls for additional page resources to be downloaded in parallel with the primary request.

Combined Graphs

You can overlay different graphs to make it easier to visualize the relationship between them.

For example, you might want to look at the relationship between median response times and errors, so you could overlay the Errors (total) graph with the Percentile Response Times (p50) graph.

When you combine graphs, they become part of your test report, so other users on your team can look at them too.

Sharing the Test Report

To share the report with your team, you can simply invite them to join your Loadster team from your Team Settings page. Anyone on your Loadster team can access test reports.

To share the report with people who aren’t Loadster users, you can hit the share button at the top of the report, to generate a public URL to share with non-users. The public URL has a random string that makes it practically impossible to guess, but anyone who has it can view the report.

Evaluating Load Test Outcomes

The main point most load tests is to determine whether your site meets the performance and scalability requirements (at least, according to the assumptions made in the test).

It can often be difficult to reduce the complicated multi-dimensional results of a load test to a single “thumbs up” or “thumbs down”. That said, here are a few questions we can ask ourselves as we analyze the results of a load test.

Were the assumptions realistic?

Going into a load test, we make a lot of assumptions. We make assumptions about how our users interact with the site. We make assumptions about traffic patterns. We make assumptions about the number of concurrent users who will try to use the site at any given moment.

Determining whether these assumptions are realistic is often a task for the product owner. At the very least, we as engineers owe it to the interested parties to explain and document the assumptions we make about user testing.

The quality of a load test result is only as good as the assumptions that went into it.

Did the test generate the target amount of load?

Scalability requirements can be stated in many different ways. We might say “the system must handle 500 concurrent users” or we might say “the site must handle 1000 hits per second” or even something more abstract like “the system must handle 6000 orders per hour”.

For a successful load test, we must often work backwards to translate these requirements into variables that we can control. How many concurrent users does it take to generate 1000 hits per second? How many do we need to place 6000 orders per hour?

This may require trial and error.

After a test completes is a great time to review whether the test generated enough load to hit these targets. If the test did not generate the intended load, we may need to change the number of bots and re-run it. Sometimes it takes several tests to find the right parameters.

Did the bots report acceptable response times?

Once we’ve established that the test did indeed generate the intended amount of load, we should look at the response times recorded by our bots.

The average response time is important, but it doesn’t tell everything. What was the maximum response time? Did the response time remain acceptable even during spikes?

Your definition of “acceptable” may vary. Generally, we recommend aiming for sub-second response times on the large majority of requests. However, the right number is customer-dependent and product-dependent. It is up to you and your customers to determine what “acceptable” really means.

Were there errors?

The presence of errors in a test is almost always a bad sign. Sometimes the cause of the errors is mundane, like an HTTP 404 from an incorrect step in a script. Other times it is more tricky.

If socket timeouts or connection timeouts occur, it is very likely a sign that the server is overloaded. This is also the case with certain HTTP status codes like HTTP 503.

If the errors are related to heavy load, we could try reducing the load by half and re-run the test, to see if they still happen. When the cause is unclear, it might make sense to play the script with a single user in the script editor, or check the server logs for more information.