Surefire plugin crash

Surefire maven plugin

I use it to run tests as most of the people. What is maybe not so typical I always run it separately in my pipeline, like so:

So building first, testing second.
Of course in real usage I run much more modules in parallel and many other details are happening not relevant to the topic I want to present. What is important here is the separate surefire invocation. I recently encountered situation of randomly crashing tests.

Understand the background

So the tests after some pipeline tweaking started to crash randomly. My configuration was: Linux machine with 3.0.0-M3 surefire plugin. Many of the job runs were errors (well, pass-fail-error situation showed up instead of nice pass-fail messages…) How to handle such a situation? We need to really understand what is going on before starting fixing things randomly. When surefire starts like in the example above, it creates separate process. So Maven is the first process which in turn starts second separate surefire process.
We have concurrent maven steps starting surefire concurrently with default fork settings: -DforkCount=1 -DreuseForks=true. It means all the test classes will use the same process for the given surefire instance so in above pipeline we are going to have 2 concurrent surefire processes in total lasting for the duration of all tests.

Crash test analysis

When dealing with crashed tests we need to group crashed them in some way. In my case there were 2 significant groups:

  • delayed crashes  – few tests were executed, crash was reported after few or more minutes
  • instant crashes – no tests were executed, immediate crash was reported

Delayed crashes

Delayed crashes indicate most probably there is some problem with RAM memory on build machine or with memory configuration for surefire. In log file one can see surefire invocation like this:

To verify Xmx reason one needs to look at surefire processes in Linux machine:

should be enough to see how much memory single surefire process uses. In my case Xmx setting was ok and all surefire instances were using only inital amount of 512 MBs of RAM.
To verify RAM it is convenient to have some kind of monitoring utility in place although using top/htop could be enough as well. If memory consumption during tests is closing to 100% and in the same time period our test crashes we can be sure this is the reason of delayed crash.
That was the case on my machine as well and it was enough to reduce number of Jenkins executors to allow the machine not to run into out of memory problem even with all of them being busy at the same time.

Instant crashes

This type of crash is much more interesting. In my case after adjusting the environment to use less memory, delayed crashes were gone, but I was still getting instant crashes which was producing much noise in my pipeline.
Again looking closer at the crash log, no meaningful information was in the stacktrace there but in the beginning I could observe information:

So, yes, there are dumpstream files created with much more specific information in workspace:

I could learn from there, surefire couldn’t create file in some location. It just turned out surefire creates temporary files somewhere there. The line with surefire invocation actually indicates which paths are in use.
Still, it is unclear which argument is which – there are 3 of them:

  • /workspace/module/target/surefire
  • surefire5632314446045888006tmp
  • surefire_08947138625012870304tmp

How can we get more details? Well, surefire is open source project available in github:
We can clone it and open it in IDE. In my case I was interested in 3.0.0-M3 version so I checkout out that tag (newer versions exist as a branch and older as tags only). Let’s use code reading skills now!
There is ForkedBooter class holding information how main method arguments are used:

So, arguments are named in the following way:

  • /workspace/module/target/surefire – temp dir
  • surefire5632314446045888006tmp – dump file name
  • surefire_08947138625012870304tmp – properties file

We can see here, potentially there may be permission or path issue. When user does not have privilege to write to target dir or there is some problem with path itself, the test will crash.
It was not the case for me.

Temp filenames look safe for concurrent execution, but temp dir looks like if it might cause problems – there are 2 separate maven and surefire invocations both must be using the same temporary directory!
Indeed, there is tempDir parameter which sets custom temp dir name (one can view all the available parameters in AbstractSurefireMojo.class file!):

Surprisingly on Windows, temp dir is generated in %temp% location with number appended so it is completely parallel safe. So if you happen to have instant crash on this OS, tempDir parameter will not help.

Sum up

Lesson learnt is always more less the same: know the problem, understand the background and apply the knowledge to solve it.

Few recipes for declarative pipeline

It is often required to iterate over some set of names using sequential or parallel stages.
The good example is when we have list of modules – possibly retrieved automatically – which we want to test, sequentially or in parallel.

1. Sequential example.

Let’s look at sequential example:

First we just build the application without executing any of the tests.

In my opinion tests should be always separated from the build process – even if they are unit tests. The task seems to be easy but it requires some attention.
Firstly, we need to use scripted pipeline approach. Thus it is possible to use for loop. Each of the stage gets generated test name.
Secondly, we need catchError step. If not used, the pipeline would abort on first unsuccessful iteration, while we want all iterations to execute no matter the status is.
Thirdly, after each iteration we need to preserve the surefire output result – testng in this example – so that it can be archived properly.
All of the stages of this pipeline are executed using the same node and the same workspace path.

2. Parallel example.

Making the pipeline parallel I call flattening. Let’s see the example:


The key to understand more complex pipelines is to understand where each stage is executed in terms of workspace path and machine. In the example here all the stages which are generated are executed in parallel as parallel keyword is used. It means there will be different workspaces used for each of them (they are executed on the same machine): workspace_path, workspace_path@1, workspace_path@2 etc.
(it is also possible to configure the stages, so that they are executed on different Jenkins nodes).
Thus the pipeline needs to first stash the build artifacts after BUILD stage and then unstash them in generated stage. Stashed files are stored in Jenkins master to allow accessibility to all of the stages no matter their location.
In this example we do not need catchError step as even when any of the stage fails, the rest will still be executed. Testng report file will not be overwritten (all of the reports reside in different workspaces) but it is good practice to rename it so that Jenkins can handle it properly when sending them to Jenkins master for report to be generated. In this example all the reports are gathered in one place and reported in one go while log files are archived after each stage finishes.

3. Summary.

It was just a few simple recipes of the pipelines. There is of course much more: we can split the pipeline not only into stages but also into jobs. BUILD stage could be separate job in the above example. We can do more tasks in parallel than just tests like multiple checkouts, multiple builds. The parallelism can be related to either stages or jobs.
The more things are flattened the more attention we need to draw to the resources we have: number of machines, number of Jenkins nodes on each of them in comparison to available memory and processors, disk read/write speed.
It is required so that the pipeline speed increases with flattening process and not the opposite.