A Developer’s Guide to Performance Testing

Ensuring modern web applications remain performant


In this article we will be discussing the different ways we can test the performance of a contemporary web application.

Performance testing ensures the scalability, stability and reliability of your web services. We will discuss why we test, the types of testing, then conclude with a simple API test using Gatling, a popular performance testing framework.


What problem is performance testing trying to solve? At some point in our web surfing careers we have all witnessed events such as festival websites crashing as ticket sales begin, game performance decreasing, or web stores becoming slow during peak hours.

Each of these symptoms betrays poor web application performance. Something underpinning the service is failing as it is more heavily used. We can avoid this by simulating these events, and adapting our system to handle them more gracefully.

Performance testing comes in a variety of flavours:

  • Stress Testing/ Capacity Testing: This tests how many users a system can cope with before it reaches a performance level deemed unacceptable. There will always exist a limit, we’re just gauging how reasonable that limit is.
  • Load Testing/ Business as Usual (BAU) Testing: This is where we see how our system fairs under our expected traffic. Like capacity testing, we may begin with smaller number of requests, then ramp up to check for a breaking point in our system.
  • Soak Testing/ Endurance Testing: This is where we put the system under pressure for a sustained period. This helps identify issues that arise over time.
  • Spike Testing: This looks to see what happens to our system when there is a sudden influx of use. This could be measured as a one time spike, or as several spikes over a given period.
  • Volume Testing/ Flood Testing: This covers the amount of data sent to and from the system.
  • Scalability Testing: This is used to check how a system handles increasing workloads. For example, as we increase usage linearly, does CPU usage also increase linearly? If demand increases can our cloud infrastructure effectively spin up more instances?

In each of these types of testing we may be looking for slightly different things. Common issues include:

  • Bottlenecks: This is when individual factors obstruct the flow of information in an entire system. Often this can occur because of Disk/ CPU/ Memory Usage. For example, if CPU is running at 100% on an instance then this will limit its ability to process data.
  • Memory Leaks: This is especially noticeable in soak tests. If we are not correctly assigning memory the amount of memory usage will steadily increase.
  • Long Load/ Response Times: As requests increase and resources within our system are consumed this may be reflected in how long it takes to load pages/ responses.
  • Poor Scalability: Similarly, as the requirements of our systems increase we hope it will scale appropriately. In a cloud orientated architecture this may be by adding more instances, or this may simply be that our current infrastructure is capable of handling a larger load.

Before we design our tests, it is helpful to know what ‘good’ looks like. We could design some service level agreements (SLAs) around:

  • Load/ Response times: If we’re worried that our system may become slow, we should define fast.
  • CPU/ Memory/ Disk Usage: This can apply to all instances, including those hosting databases.
  • Scalability: As demand on our system increases, what are our expectations for how it scales. This could be in terms of computing resources, or it could be in terms of number of instances spun up/ Lambdas consumed.
  • Throughput: This is how much data is transferred from source to destination. Poor throughput suggests packets being dropped in transit. You can compare this to your bandwidth to find your maximum capacity.
  • Error Rate: As servers become more congested we may see the number of 4XX and 5XX error responses increasing. We could define a suitable percentage of these codes.

Worked Example

Let’s say we have a simple REST API used to retrieve a list of users. It has a single endpoint /users which returns a list as below.

"data": [
"name": "User 1"
"name": "User 2"

We expect to receive 1 request per second (RPS) as average traffic, but on a busy day this will go up to 4RPS. This rise takes 10 seconds, then holds for 20 seconds, then goes back to normal over another ten seconds. Our system expects a maximum of ten users to be returned at any one time.

Our expected RPS

First of all, let’s define some SLAs for acceptable performance:

  1. 100% of requests receive a 200 response.
  2. Response times are less than 50ms.
  3. Memory usage scales linearly with demand.

None of this is illustrative of how you would test a production system. Actual applications under test will be much more complex, all of this is simplified to try and convey the salient points.

Given that these are our targets, let’s now define a number of performance tests. In reality the limits provided would be much more flexible, and we may iterate through them as we learn more about the application.

Stress Testing/ Capacity Testing

For this test we can continually ramp up the number of RPS. We will scale to 7RPS over a minute, then view the results. It may be that this isn’t sufficient to break the system, but it is enough to give us confidence in our performance.

RPS used for stress testing

Load Testing/ Business as Usual (BAU) Testing

Here we can mimic the traffic patterns provided to us.

Soak Testing/ Endurance Testing

A reasonable soak test may be to send 5RPS over a minute, this represents a heavy load over an extended duration.

RPS used for soak testing

Spike Testing

The spike we have been provided goes from 1RPS to 4RPS over 10 seconds. We may want to exaggerate that to separate out our spike testing and our BAU testing. A reasonable start may be 6RPS over 10seconds, hold for 20 seconds, then return to 1RPS for a period.

RPS used for spike testing

Generally it is important to leave a run off period post-spike. This is as if you have a queueing system anywhere in your architecture it may be that the spike has saturated your queue, and you need to ascertain if under regular traffic it is able to clear out and maintain a constant flow.

Volume Testing/ Flood Testing

In our system data is being taken from the database and returned via an API endpoint. We can propose adding 100 users to the database, which will allow us to test performance as volume increases.

Scalability Testing

This is wrapped up in the other test scenarios. As we carry out our testing we want to monitor that our system scales well. This may be by monitoring CPU usage, or in a cloud environment ensuring our scaling policies are functioning correctly. Here we have defined it only in terms of memory. As long as our memory scales linearly with demand we feel our system can scale appropriately.

Writing Our Service

We will implement the service using Spring Boot, the Spring Initializr set up is below.

Spring Initializr setup for our users service

It isn’t worth covering exactly how we build the service, the full code can be found in the repository here. Once we have it running we return a static list of users as shown below:

"data": [
"name": "Sarah"
"name": "June"
"name": "Elizabeth"

Now we would like to spin up our performance testing suite. To this end we will be using Gatling. There are open source and paid versions, we will be using the former. Our implementation will be in Scala, though you won’t really need to know any of the language syntax to understand what’s going on.

Rather than go through every type of testing, we’ll pick one to focus on — load testing. The remainder are basically variations on a theme.

Below we can find the script used to load test our local endpoint. We ramp up our number of users to one, who then begins sending requests. They hit 1RPS for 20 seconds, then ramp up to 4RPS in 10 seconds, holding for another 20 seconds. Once this is complete they ramp down to 1RPS over 10 seconds, before holding for another 20.

Running this we receive the below Gatling report.

Example Gatling report

We can use this to determine if our system reaches its SLAs. From the above all requests were served successfully (100% of requests receive a 200 response). We can also view our response time distribution, where we see all responses are less than 50ms.

Distribution of response times for our API

The final SLA is slightly more complex. Gatling does not have access to the internals of our application, only the responses it receives. To monitor the resources of our application we will introduce a new tool VisualVM. This will allows us to access the memory usage of our application as the tests run.

Example JVisualVM Output

Examining the resources consumed during our load test we can see that our system copes gracefully with the expected traffic. In reality the lack of stress witnessed could mean we might be able to scale down our instance sizes, saving a bit of money!


In conclusion we have covered the motivation for performance testing, the different flavours available, and concluded with an example application.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
James Collerton

James Collerton

Senior Software Engineer at Spotify, Ex-Principal Software Engineer at the BBC