Using Continuous Integration for Performance Testing. Is it worth the effort?
Short answer is a definitive yes. But as you may suspect there’s a little more to it.
During my #NeotysPAC talk, I described the long and winding road my team took developing our “Mark59” open-sourced solution which enables us to execute performance tests daily via a Jenkins server, and how we automated SLA results analysis.
In this blog, I’m going to discuss some of the experiences we’ve had running Performance Testing via CI/CD (Continuous Integration/Continuous Delivery). We’ve been running a CI server in one form or another for at least three years now, so I’d like to think we’ve learned some (not always easy) lessons along the way. A lot of this relates to our Mark59 framework, but the principals are more general, so I hope at least some of our ideas may be useful to someone thinking about a CI/CD approach in their workplace.
How my team runs a CI/CD pipeline.
We run CI/CD using Jenkins, on which we deploy and run JMeter, and until recently LoadRunner, performance tests in a mix of daily and weekly runs. We also work with the development and automation test teams, particularly with Selenium script development and deployment. DevOps is the new buzz word for this I hear, I’ve just read Stijn Scheper’s PAC blog where he talks about similar principals to what we have adopted.
Selenium scripting targeted for JMeter has become a core component of our work. From a DevOps perspective, I’d like to add an extra dot point to Stijn’s framework suggestions:
keep the application (script) logic self-contained: that is, it can be run stand-alone
For example, this week I’ve been working with an application delivery team that is using our Selenium scripts in a CI run to verify production environments. They do have their own regression suite, but it’s complex and the application logic is difficult to extract. DevOps in action!
Our Experiences: The Good Stuff
So, what are the pros of continuous testing we have found? I’d like to discuss something that happened to several of our tests this month. Have a look at this graph.
It shows injector CPU % utilization over the last 50 or so days for one of our most important tests. Test days along the bottom running left to right, past to present, with CPU % the dependent axis. As an aside, this graphic comes from the Mark59 Trending Analysis tooling – the ability to display historical run data graphically was a game-changer for us. Anyway, this is a test that runs Selenium scripts very heavily, so when CPU utilization on this injector went from 40-something to over 60 it affected transaction times (over 50% is the point transaction times tend to get hit). In this case, the CPU hit wasn’t quite enough to break the transaction SLAs, we actually picked it up via our metric SLAs:
So, we raised a problem request on which we were able to state the day the issue started. Even so, as happens at large corporate sites, it took a week or so to identify the team and change responsible. It turned out our injector, a Windows Virtual Server, had been moved off one physical cluster onto another. We asked for it to be moved back, but in fact, the Server Admin team moved it to a newer physical cluster, with the result that CPU utilization dropped dramatically. We got the same CPU specs in terms of the number of cores and reported clock speed, but the newer CPU stack is much better at handling concurrent processes. A critical factor when running multiple concurrent chrome browsers being driven by Selenium tests (in case you were wondering, we’re on Xeon Gold servers now).
Key learning: The ability of a CPU to handle concurrent processes is an important requirement for the Selenium component of our framework.
But what if we hadn’t been running the test via CI? What if we’d been using a traditional project-by-project based approach to testing? Well, there is a good chance we may not of tested this application until next year, so it would have been an extremely difficult task to track down this change. As we would most likely of been making script changes for the new project, we would be very probably of been confused about what had happened, and of assumed it was a script thing – and missed our key learning.
The take away from this is that we can identify changes that impact performance as they occur. It’s proven critical several times now.
There are many other pros as well that I could talk about. In my PAC talk, I discussed some, but you can really summarise them as being the advantages you get from automating a process, and so reducing the risk of human error with a more hit-and-miss manual approach.
Our Experiences: The Challenges
Of course, there can be cons to Continuous Integration testing as well. A judgement call needs to be made about the importance of an application, and if the nature of the application itself to determine whether it should be included in a CI/CD testing pipeline. Is this application ever going to require more than a few performance test runs? If this application fails in production, what are the consequences? Can it be down for a few weeks while it’s being fixed, or is it mission-critical? What is the appetite of the application team or application owner for load testing? Do they just see it as a box that needs ticking, or is it of importance to them? How stable is the application data? Is it controllable, or too dynamic to keep running the same scripts against?
Probably the nastiest issues come when addressing the interfaces or dependencies of the application under question and determining the consequence to downstream systems of running the test continuously. We have largely mitigated these issues by using mocked responses, but it can be a complex problem.
Get any of these judgements wrong, and you can find yourself trying to run a test that no-one cares about, that is troublesome to maintain, and basically a big waste of time and money. On the other hand, if you don t run a test in CI that you should of, and it incurs a spectacular performance failure one day, you are going to know there was a risk and a cost you could have avoided.
Mitigation: Where we have Improved
One question you may be wondering about is why we decided to use Selenium scripting. Originally, we used LoadRunner with Vugen Web scripting for our CI/CD pipeline. As this technology works at the HTTP level, we found that we were constantly having to update the scripts even for the most minor application changes. Script maintenance became a major headache. So, we made the call to use Selenium via JMeter. Both use Java, also our most critical systems are Java-based, so it was a natural fit for us. I won’t go into the implementation details, suffice to say that the maintenance of our scripts dramatically reduced. In fact, our team size has halved from our peak. But no need to stress about job loss – as people in our team have picked up extra skills outside pure performance testing, they’ve got work in all sorts of interesting areas.
By the way, our Mark59 framework can still cater to a CI/CD pipeline using LoadRunner and well as JMeter. Generally, if a performance test tool produces well-defined output, it should be possible to load and process results in a CI/CD pipeline. Hint, hint for anyone wanting to give a go with NeoLoad.
Bottom line, the problem with CI/CD is…
Complexity. It doesn’t really matter how much gloss, marketing, or whatever hype you want to put on it, with the current technology and tools available the building of a CI/CD DevOps pipeline is complex. Perhaps by its nature, it’s never going to be an easy thing.
In our CI/CD solution, we have adopted a few practices to help manage-ably. We try to create as few as different types of jobs and job streams as possible. We use a template approach to our jobs to allow easy parameterization to achieve this. We break up our application streams into different Jenkins tabs, so we can see at a glance the state of our applications. We send out daily results email with the appropriate job links so we and the application teams are aware of issues without digging through Jenkins. Within our Mark59 framework, we have tried very hard to make problem resolution as easy as we can by having various options and types of logging available.
But at the end of the day, there is a steep learning curve to overcome with CI/CD. One way we hope to improve things for anyone using our framework is that we will document the major jobs and job steps involved, and as much a possible provide samples to help with setup.
Also, we hope to create a publicly available AWS AMI (AWS Machine Image) on which we will place all our tooling and sample CI/CD pipelines. So much easier to build things if you can start from a working example.
Anyway, I hope that gives you a few things to think about, good luck!