Load testing¶
We don’t normally worry much about performance. The Digital Marketplace can handle normal load. If we have problems, it’s bad, but not the end of the world - users can retry later.
However, on the day that a framework closes, we see much more traffic than normal. On these days, the consequence of the Digital Marketplace failing is also much higher. If a supplier is unable to submit their application on the final day, it is too late. This can cause quite a bit of bother, as happened for G-Cloud 12.
As a result, we do load testing (sometimes called performance testing) in advance. This lets us find out whether the Digital Marketplace is likely to stand up to the peak in traffic. Finding out about performance problems early helps us fix them before they impact users.
This page explains how to do performance testing. It is based on how we did the testing for DOS 5. You should update this page after you do the next round of load testing.
What to test¶
Your goal is to answer the question: “will the Digital Marketplace stand up to the load of closing day?” First, you need to work out what load you expect. Your load should match reality as closely as possible. Otherwise, you run the risk of missing performance problems, as we saw with G12.
Frameworks of the same type tend to be quite similar. So you can use the traffic seen on the closing day of the previous framework of the same type to produce your estimate.
On a closing day, the vast majority of user traffic hits user-frontend and supplier-frontend. You can use Google Analytics and/or the server logs to see what requests those users were making. Pay close attention to requests that are expensive or are known to be problematic, such as file uploads.
Construct user journeys that between them match the pattern and level of requests you saw. You can use previous load test scenarios, user maps and other resources you can find to help guide you.
See the DOS 5 load testing plan (CCS only) for an example.
How much load¶
You now know what your traffic should look like. How much of it do you need to give yourself enough confidence?
Look at the number of submitted services for previous frameworks of the same type. If the trend continues, how much larger will this framework be than the previous one? For DOS 5, we estimated this as 20%.
On top of this, you need to account for unexpected spikes and underestimation. For DOS, we’ve seen previous spikes of up to 2x normal closing day load. So for DOS 5, we tested at 4x predicted load, which was 4.8x the load seen for DOS 4.
Writing the tests¶
You now know what user journeys you want to test, and at what rate. Now you need to configure the performance tests to match this.
Follow the readme and previous examples in the repo, and also the Gatling docs.
Running the tests¶
You will want to run your tests against staging. This is a close approximation of production, but won’t affect real users. Before you run your test:
Inform PaaS, Notify and Cyber. If your test creates a large number of suppliers inform Duns & Bradstreet.
Scale up staging. You want it to match the scale you will use in production on the closing day.
Ensure your framework is in the correct state in staging.
Create any test users/suppliers needed by your test.
Make sure your internet connection is sufficient. The DOS 5 tests needed a 1/1 MB/s up/down connection.
Warn the team in the
dm-release
channel.
Now you can run the tests. Save a copy of the results and any interesting graphs in the Drive folder (CCS only).
Overload testing¶
Additionally, you may want to test how the Digital Marketplace behaves if it receives more traffic than you expect. For example, if there is an outage users will retry their failed requests. This can cause substantial increases in traffic which can make the outage worse. For G12, we saw about 5x normal traffic during the outage.
Thus, for DOS 5, we ran an additional load test where we ramped the traffic up to 6x. This level of traffic indicates that there is probably already an outage in progress. So we relaxed our requirements for errors and response times.