For quite some time, we’ve been testing our GUI with Selenium and the Robot framework using the PhantomJS headless browser. Over the years the number of tests steadily grew, which is generally considered a good thing. However, as test coverage increased, the total execution time of our test suites did too, causing our developers to become increasingly reluctant to tack on more robot tests due to the impact they had on the deployment cycle.
When a single run of our GUI’s test jobs finally reached 40 minutes, we decided it was time to take action, so we ventured out in search of a way to speed up our CI jobs and importantly to maintain their stability.
Initial Situation
One problem was immediately clear. Our Robot tests are grouped according to so-called “test setups” which maintain the fixture data and configuration required to run each test suite. We were running each of these test setups and each of their test suites sequentially.
Since just about everything seems to have a multi-core processor these days – with some smartphones even having eight cores – taking advantage of this to execute our tests in parallel was a pretty obvious decision. Would it solve all of our problems? Probably not, but we figured as the first step it would be a significant improvement.
We began by simply starting tests simultaneously, and PhantomJS immediately started crashing strangely while evaluating various JavaScript snippets we injected with Selenium. Due to this and PhantomJS’ lack of feature support from reaching its EOL, and since we strongly suspected none of our customers were using PhantomJS, we figured it was time to migrate to different technologies instead of delving deeper into the crashes.
We picked Chromium running in headless mode for the first experiments, and in order to keep our CI slaves clean from Chromium’s additional dependencies, we packed everything we needed into a Docker image.
Digging Deeper: You Can’t Go Wrong With “Default,” Can You?
Initially we had six test setups (as described above) and decided we should run at most four of them concurrently since the CI machines are quad-core.
We developers love to put things in “Default” spaces (or “Common” or “Shared”) whenever we are either too lazy to find the appropriate location or afraid of paying the bill for creating a new one. As it turned out, we had one big fat “Default” test suite that took 20 minutes alone to run.
We split up this dominant suite, ending up with eleven in total. In doing so, not only did we have smaller units to shuffle around, as a bonus the scope, clarity, and description of each suite became more precise as well.
The next trial yielded positive results. Though measurable, they were not overwhelming and definitely did not meet our expectations. During the first three-quarters of the time the tests were executing, the machine was pretty busy running tests on each core, but the workload suddenly dropped when only a single test suite remained, wasting 75% of the available processing capacity!
What had happened?
Well, remember the ride back home from your last camping trip with your friends when everybody threw their luggage into the trunk all haphazard and catawompus, and not until you found yourself crammed between Boris and a moist sleeping bag, questioning the decisions you had made in your life that led you to that particular point in space and time, did you admit there was probably a better way to do things?
That’s pretty much what happened here. All of our tests were being shoved into the CI machine without any rhyme or reason, so one suite would always dominate.
The Result: Getting Feedback Three Times Faster
The solution was simple, and like most things, obvious in retrospect. We sorted the test suites according to their expected runtime and added them to the execution queue in descending order. This gained an additional 20% speed-up.
As of this writing, we have reduced the total execution time of our test suites from 40 minutes to 12. That’s less than one-third of the original time! And we’re looking for even more speed-ups. The reluctance and guilt among our developers is gone, and we can finally feel good about writing robot tests again.
Tl;DR
- Running tests sequentially took too long
- PhantomJS caused instability
- We switched to Chromium-headless and docker to run tests in parallel
- We sorted the tests according to their run-time and added them to the execution queue in descending order
- Total test execution time was reduced from 40 minutes to 12