Surefire plugin crash

Surefire maven plugin

I use it to run tests as most of the people. What is maybe not so typical I always run it separately in my pipeline, like so:

So building first, testing second.
Of course in real usage I run much more modules in parallel and many other details are happening not relevant to the topic I want to present. What is important here is the separate surefire invocation. I recently encountered situation of randomly crashing tests.

Understand the background

So the tests after some pipeline tweaking started to crash randomly. My configuration was: Linux machine with 3.0.0-M3 surefire plugin. Many of the job runs were errors (well, pass-fail-error situation showed up instead of nice pass-fail messages…) How to handle such a situation? We need to really understand what is going on before starting fixing things randomly. When surefire starts like in the example above, it creates separate process. So Maven is the first process which in turn starts second separate surefire process.
We have concurrent maven steps starting surefire concurrently with default fork settings: -DforkCount=1 -DreuseForks=true. It means all the test classes will use the same process for the given surefire instance so in above pipeline we are going to have 2 concurrent surefire processes in total lasting for the duration of all tests.

Crash test analysis

When dealing with crashed tests we need to group crashed them in some way. In my case there were 2 significant groups:

  • delayed crashes  – few tests were executed, crash was reported after few or more minutes
  • instant crashes – no tests were executed, immediate crash was reported

Delayed crashes

Delayed crashes indicate most probably there is some problem with RAM memory on build machine or with memory configuration for surefire. In log file one can see surefire invocation like this:

To verify Xmx reason one needs to look at surefire processes in Linux machine:

should be enough to see how much memory single surefire process uses. In my case Xmx setting was ok and all surefire instances were using only inital amount of 512 MBs of RAM.
To verify RAM it is convenient to have some kind of monitoring utility in place although using top/htop could be enough as well. If memory consumption during tests is closing to 100% and in the same time period our test crashes we can be sure this is the reason of delayed crash.
That was the case on my machine as well and it was enough to reduce number of Jenkins executors to allow the machine not to run into out of memory problem even with all of them being busy at the same time.

Instant crashes

This type of crash is much more interesting. In my case after adjusting the environment to use less memory, delayed crashes were gone, but I was still getting instant crashes which was producing much noise in my pipeline.
Again looking closer at the crash log, no meaningful information was in the stacktrace there but in the beginning I could observe information:

So, yes, there are dumpstream files created with much more specific information in workspace:

I could learn from there, surefire couldn’t create file in some location. It just turned out surefire creates temporary files somewhere there. The line with surefire invocation actually indicates which paths are in use.
Still, it is unclear which argument is which – there are 3 of them:

  • /workspace/module/target/surefire
  • surefire5632314446045888006tmp
  • surefire_08947138625012870304tmp

How can we get more details? Well, surefire is open source project available in github:
We can clone it and open it in IDE. In my case I was interested in 3.0.0-M3 version so I checkout out that tag (newer versions exist as a branch and older as tags only). Let’s use code reading skills now!
There is ForkedBooter class holding information how main method arguments are used:

So, arguments are named in the following way:

  • /workspace/module/target/surefire – temp dir
  • surefire5632314446045888006tmp – dump file name
  • surefire_08947138625012870304tmp – properties file

We can see here, potentially there may be permission or path issue. When user does not have privilege to write to target dir or there is some problem with path itself, the test will crash.
It was not the case for me.

Temp filenames look safe for concurrent execution, but temp dir looks like if it might cause problems – there are 2 separate maven and surefire invocations both must be using the same temporary directory!
Indeed, there is tempDir parameter which sets custom temp dir name (one can view all the available parameters in AbstractSurefireMojo.class file!):

Surprisingly on Windows, temp dir is generated in %temp% location with number appended so it is completely parallel safe. So if you happen to have instant crash on this OS, tempDir parameter will not help.

Sum up

Lesson learnt is always more less the same: know the problem, understand the background and apply the knowledge to solve it.

Leave a Reply

Your email address will not be published. Required fields are marked *