Surefire plugin crash

Surefire maven plugin

I use it to run tests as most of the people. What is maybe not so typical I always run it separately in my pipeline, like so:

So building first, testing second.
Of course in real usage I run much more modules in parallel and many other details are happening not relevant to the topic I want to present. What is important here is the separate surefire invocation. I recently encountered situation of randomly crashing tests.

Understand the background

So the tests after some pipeline tweaking started to crash randomly. My configuration was: Linux machine with 3.0.0-M3 surefire plugin. Many of the job runs were errors (well, pass-fail-error situation showed up instead of nice pass-fail messages…) How to handle such a situation? We need to really understand what is going on before starting fixing things randomly. When surefire starts like in the example above, it creates separate process. So Maven is the first process which in turn starts second separate surefire process.
We have concurrent maven steps starting surefire concurrently with default fork settings: -DforkCount=1 -DreuseForks=true. It means all the test classes will use the same process for the given surefire instance so in above pipeline we are going to have 2 concurrent surefire processes in total lasting for the duration of all tests.

Crash test analysis

When dealing with crashed tests we need to group crashed them in some way. In my case there were 2 significant groups:

  • delayed crashes  – few tests were executed, crash was reported after few or more minutes
  • instant crashes – no tests were executed, immediate crash was reported

Delayed crashes

Delayed crashes indicate most probably there is some problem with RAM memory on build machine or with memory configuration for surefire. In log file one can see surefire invocation like this:

To verify Xmx reason one needs to look at surefire processes in Linux machine:

should be enough to see how much memory single surefire process uses. In my case Xmx setting was ok and all surefire instances were using only inital amount of 512 MBs of RAM.
To verify RAM it is convenient to have some kind of monitoring utility in place although using top/htop could be enough as well. If memory consumption during tests is closing to 100% and in the same time period our test crashes we can be sure this is the reason of delayed crash.
That was the case on my machine as well and it was enough to reduce number of Jenkins executors to allow the machine not to run into out of memory problem even with all of them being busy at the same time.

Instant crashes

This type of crash is much more interesting. In my case after adjusting the environment to use less memory, delayed crashes were gone, but I was still getting instant crashes which was producing much noise in my pipeline.
Again looking closer at the crash log, no meaningful information was in the stacktrace there but in the beginning I could observe information:

So, yes, there are dumpstream files created with much more specific information in workspace:

I could learn from there, surefire couldn’t create file in some location. It just turned out surefire creates temporary files somewhere there. The line with surefire invocation actually indicates which paths are in use.
Still, it is unclear which argument is which – there are 3 of them:

  • /workspace/module/target/surefire
  • surefire5632314446045888006tmp
  • surefire_08947138625012870304tmp

How can we get more details? Well, surefire is open source project available in github:
We can clone it and open it in IDE. In my case I was interested in 3.0.0-M3 version so I checkout out that tag (newer versions exist as a branch and older as tags only). Let’s use code reading skills now!
There is ForkedBooter class holding information how main method arguments are used:

So, arguments are named in the following way:

  • /workspace/module/target/surefire – temp dir
  • surefire5632314446045888006tmp – dump file name
  • surefire_08947138625012870304tmp – properties file

We can see here, potentially there may be permission or path issue. When user does not have privilege to write to target dir or there is some problem with path itself, the test will crash.
It was not the case for me.

Temp filenames look safe for concurrent execution, but temp dir looks like if it might cause problems – there are 2 separate maven and surefire invocations both must be using the same temporary directory!
Indeed, there is tempDir parameter which sets custom temp dir name (one can view all the available parameters in AbstractSurefireMojo.class file!):

Surprisingly on Windows, temp dir is generated in %temp% location with number appended so it is completely parallel safe. So if you happen to have instant crash on this OS, tempDir parameter will not help.

Sum up

Lesson learnt is always more less the same: know the problem, understand the background and apply the knowledge to solve it.

Pipeline design trick

In the previous article I mentioned about attention we need to pay to the resources we have when flattening the tests. Still, the available resources end at some point.

Let’s imagine we have a pipeline with quick/small tests launched after each commit. As new features are added, they get longer and longer. We want to be sure our software is working so apart from unit tests we add integration/medium tests as well as heavy/large part doing end to end stuff.

diagram 1: small, medium and large tests which last 1.5 hour

We try to flatten the tests and start large ones as fast as possible but they are so slow it doesn’t help. We try to run large tests phase every day once in the evening but then we can only learn the next day something is wrong with the software which is not what we want.

diagram 2: flattened tests which last too long


diagram 3: separated large tests run in the evening once

What can we do? How can we quickly get feedback about the application under test while having all of the tests executed? Quickly – means something between 20 or 30 minutes and not hours of waiting…

We can find a way to the solution by not analyzing the pipeline anymore but try to look at it from other point of view: the feedback point of view. What is the group of people who needs the feedback then? What will happen if the feedback for some area of the application will be delayed?
Most often there are teams in the company which deal with specific parts of the application: team 1 will do module A and B while team 2 will do C, D and E. Let’s imagine module B is the one involved in large tests. It is tested by many unit tests but then undergoes long phase of end to end tests which last for example 1 hour.
We need to separate large tests phase just as we did when running it once a day, but trigger them synchronously with small tests and making sure it will use the same build artifact as small and medium tests do. The triggering of the large tests phase has to be done only when they are not in progress to prevent the tests queue to grow indefinetely.

diagram 4: large tests as separate pipeline triggered synchronously

Thus, we get feedback as fast as we can for small and medium tests (every 30 minutes) which allows team 2 to keep the pace. Team 1 also benefits as it gets feedback about module A very often. It gets feedback about module B as fast as possible that is about every 75 minutes if testing cycle is active all the time.
If our pipeline is continuous deployment one, we need a trick to allow us to deploy only fully tested code as some of the commits will not be tested by large phase at all. This can be done by tagging the code by each of the test phase (small/medium and large) individually and letting only fully tagged commit id to be deployed.
The only price we pay is that we do not know what is the full test result of every single commit. We get the full feedback about green and yellow tests but red ones are only known sometimes. The runs with uncertain large tests result in diagram 4 are run numbers: 2,3,5,6 while 1 and 4 are fully tested and may be safely released or deployed. I am sure this is very small price for advantages we get.
This idea may be used to divide the pipeline into even more phases if there are more of them which do not fit into 30 minutes basic tests duration. More large tests or maybe some additional medium ones. All of them may be triggered when small tests start under condition they are not in progress and all of them may set individual tags to mark the commits they were testing to be able to always track commits which are safe for release. We have separate feedbacks for all of the phases done as fast as possible.

It is better to have all the tests be very fast and finish after 20 minutes but if this is impossible the above example approach may help to keep both the rich and quick feedback in place.

Few recipes for declarative pipeline

It is often required to iterate over some set of names using sequential or parallel stages.
The good example is when we have list of modules – possibly retrieved automatically – which we want to test, sequentially or in parallel.

1. Sequential example.

Let’s look at sequential example:

First we just build the application without executing any of the tests.

In my opinion tests should be always separated from the build process – even if they are unit tests. The task seems to be easy but it requires some attention.
Firstly, we need to use scripted pipeline approach. Thus it is possible to use for loop. Each of the stage gets generated test name.
Secondly, we need catchError step. If not used, the pipeline would abort on first unsuccessful iteration, while we want all iterations to execute no matter the status is.
Thirdly, after each iteration we need to preserve the surefire output result – testng in this example – so that it can be archived properly.
All of the stages of this pipeline are executed using the same node and the same workspace path.

2. Parallel example.

Making the pipeline parallel I call flattening. Let’s see the example:


The key to understand more complex pipelines is to understand where each stage is executed in terms of workspace path and machine. In the example here all the stages which are generated are executed in parallel as parallel keyword is used. It means there will be different workspaces used for each of them (they are executed on the same machine): workspace_path, workspace_path@1, workspace_path@2 etc.
(it is also possible to configure the stages, so that they are executed on different Jenkins nodes).
Thus the pipeline needs to first stash the build artifacts after BUILD stage and then unstash them in generated stage. Stashed files are stored in Jenkins master to allow accessibility to all of the stages no matter their location.
In this example we do not need catchError step as even when any of the stage fails, the rest will still be executed. Testng report file will not be overwritten (all of the reports reside in different workspaces) but it is good practice to rename it so that Jenkins can handle it properly when sending them to Jenkins master for report to be generated. In this example all the reports are gathered in one place and reported in one go while log files are archived after each stage finishes.

3. Summary.

It was just a few simple recipes of the pipelines. There is of course much more: we can split the pipeline not only into stages but also into jobs. BUILD stage could be separate job in the above example. We can do more tasks in parallel than just tests like multiple checkouts, multiple builds. The parallelism can be related to either stages or jobs.
The more things are flattened the more attention we need to draw to the resources we have: number of machines, number of Jenkins nodes on each of them in comparison to available memory and processors, disk read/write speed.
It is required so that the pipeline speed increases with flattening process and not the opposite.

Declarative pipeline


I am working nowadays on a pipeline improvement. This is absolutely crucial part of QA process in the project. Good pipeline is the core around which we can build all the things which constitute high quality.

When starting new project, create pipeline first!

But what does it mean a “good” pipeline? It has to be reliable, informative and it has to be really, really fast. That’s it.

What’s that?

Pipeline is the sequence of actions which should start with the code build and end with the tested build artifacts. There is a feedback about the code: is it OK or not OK? In case of OK message, the artifacts may be just deleted at the end altogether with test environments (continuous integration), may be released (continuous delivery) or – the most interesting option – may be deployed to production (continuous deployment).

There are strategies about the code management: for me the most basic classification is branchless development (some info HERE) vs branching one (well you know this one for sure). In case of the latter (most popular I guess) the successful pipeline allows the code to be merged into main branch, while unsuccessful pipeline prevents the defective code from merging anywhere. In case of the branchless approach immediate actions need to be taken as the code which doesn’t work as expected is already there in your code repository.

The pipeline can be implemented using dedicated software as BambooTeamCity or open source Jenkins.


I will be describing Jenkins in this article as I have most experience with this application. As usual it is a piece of software which has advantages and disadvantages. If I was to name the biggest one item of both:

  • biggest disadvantage – disastrous quality of the plugins’ upgrades: one can expect everything after the upgrade: usability change, feature disappear, old bugs come back (regression) etc.
  • biggest advantage – great implementation of agents: each agent can have many executors which are very easy to configure; the scalability is therefore fantastic and allows massive number of concurrent jobs to be run as each of the job requires one free executor only; commercial competition doesn’t use executors and so 1 agent always means only 1 job at a time: it must be causing much bigger overhead when designing pipeline with many concurrent jobs on 1 machine in my opinion, as many agents are needed while Jenkins can use only 1 per machine.

The practical problems

I mentioned about reliability, information and performance.

  • reliability – pipeline has to work in the same way every time it is launched; no IFs, no unexpected stopping, no deviations in terms of duration of the pipeline as well as amount of feedback information, the general goal is we have to be confident about the pipeline itself
  • information – pipeline has to provide clear information what is working and what is not; clear information means we do not use any more than few seconds to know if the problem is related to module B of our Java application or maybe security problem with “user management” feature
  • performance – pipeline has to be as fast as possible; the larger project the more hard it is to achieve, for me the nice result which I call “fast” is 10 minutes for build and all unit tests and 20 minutes for all the heavier tests (integration, GUI etc); total pipeline runtime of 15-20 minutes is fantastic performance, 20-30 is acceptable but improvements need to be found, >30 is poor and improvements just have to be found; it’s a psychology behind this: let’s do it 3 times per hour or at least twice: if we are waiting 35 minutes the magic disappears

It all sounds simple but it is really hard to achieve. In practice there is often a counterpart of the list above: pipeline gets stuck at random places (application doesn’t install on 1 environment?), it gives red/yellow/green statuses but no one knows what are they tied to (can I release the software or not? is it ok?) and finally it lasts for 2 hours which feels like forever (let’s commit only few times a day otherwise we will not be able to see the result – developers think).

There are many reasons of this:

  • people’s mindset – they discuss every character in the code but forget about the pipeline
  • pipeline software specific problems – in Jenkins there can be 2 kinds of jobs: created in GUI using a mouse clicks and stored inside Jenkins application or written in so called declarative (or older version: scripted) pipeline stored in files in CVS. If the jobs are GUI based they are impossible to maintain for a longer time as their number grows and the relations between them get more complicated. They start living their own life.
  • traditional approach to the sequence of stages in the pipeline (all the stages are run in sequence)
  • doing one universal job which does many things
  • relying on software project management tool only (like Maven), to implement test concurrency and their reporting without using the features of pipeline software
  • physical environments availability
  • physical environments configuration

Fixing it (in details)

I would like to show some details in this section how to try to fix all the problems.

Squashing pattern

Let’s start with the general image of the pipeline. If there are stages/jobs run in the sequence like this:

stages in the sequence

We need to squash it as much as we can, that is:

squashed pipeline

We need separate build stage to be able to use build artifacts for all the testing stages we need to do. We just need to build only once and use the output everywhere we need.

For Maven, it means the transition from notorious:

into 2-step:

1. line is then placed inside Build stage and 2. line is a Unit tests stage.

All other tests start at the same point of time as Unit tests. Of course, heavy tests like GUI require application redeployment. So the stage has to have separate section for redeployment, like this:


gui stage

But what is the redeployment step? We have numerous problems when dealing with this step very often. To achieve the stability we have to switch to virtual solution – Docker. Only then can we have the control on this step. If you do not know it yet for some reason read about it, otherwise start using it. It greatly improves the stability of the pipeline:

docker job

But what about the situation we need numerous redeployments for e.g. different applications configiurations? Then we just take advantage of parameters each of the jobs have. In Jenkins, each of the job consists of the stages. Each job can also receive and pass parameters. Again we modify the pipeline by introducing new job:

create container

Now, for each of the required application configurations we have instance of the create_container.job running.

As you can see, we have now stage for each logical step we make in our pipeline.

In general, the pattern is to drill down to the most logical atomic steps to visualize them in pipeline software and squash them as much as possible to gain performance.

Pipeline as code

In Jenkins, as I mentioned above, there is a possibility to click the jobs in GUI. Just do not do that! It is the dead end. The jobs become impossible to maintain very quickly and you are completely stuck. Use declarative pipeline instead.

In Jenkins you use Groovy to create scripts in a simple way like this:

The point is to have pipeline expressed in some language and store it in control version system (CVS). It applies also to any other software used for pipeline: if there is a possiblity to store pipeline as code, do it, otherwise change the software !

 Moving away from project management tool like Maven

Let’s come back to the unit tests step. Unit tests are important stage but often long lasting one. Although we have separate stage for unit tests so far, we still have room for improvement there.

The usual attempt to increase the unit tests speed is:

However I think it is wrong approach as this is moving the concurrency functionality away from pipeline software to Maven. Also, the performance gain is very much questionable. The better approach in my opinion is to ask pipeline software to do the job. It is required just to get the list of modules we need to test. We can do it using Maven:

We need to create a job where 1. stage will be module list collecting stage and 2. stage will generate another stages for each of the individual modules. Here is a full example:

This is a mixture of declarative pipeline and scripted pipeline which allows dynamic stage generation.

In the GET_MODULES stage we extract all the modules from our application and store them as string in env.MODULES variable (this is saved as string and then tokenized to create array). In the INVOKE_CONCURRENT_TESTS stage we generate the stages using runConcurrent method. Each stage will be named using module name.

There are 2 things we need to pay attention to.

The 1. one is workspace. Each generated stage has its own workspace so it cannot see the project which should be tested. To point each stage to the right one, dir directive has to be used represented here with env.BUILD_ARTIFACTS_PATH. I think the best pattern – used here – is to keep workspace with build artifacts available for each job (so that it can be accessed from any agent by dir directive). If this is not possible the stage which does the build needs to archive artifacts to Jenkins master and then each of the generated stages would have to copy them from there. This affects performance of the pipeline very significantly.

The 2. important thing is maven repo set by -Dmaven.repo.local parameter. It is easy to confuse things and run into trouble when default maven repository is used. Especially when building different software versions at the same time or building 2 different applications it is good thing to have dedicated maven repository for each of them.

Anyway, we have now each of the modules tested by separate stage and the duration of the tests is equal to the duration of the longest lasting module:

unit tests job

We can do it this way as we are dealing with unit tests – unit is something which is independent and so our tests must be running properly in this way. The huge performance gain is one point, the other is very clear feedback about the point of failure if one occurs.

Speaking about the clear feedback…

Clear feedback

The goal of this is to get to the point where pipeline provides clear information what is working and what is not. We have it unit tests job already. If module B fails we will know it immediately on the screen as there is separate stage dedicated to each of the modules.

But what about GUI tests? When the job fails, which feature is not working? We need to know it immediately as well, no time to search the logs! The answer is to group the GUI tests in a way that each group represents one of the business functionalities. Let’s imagine application under test has 3 significant features tested: security, processing and reporting. After the tests are grouped (for example in separate packages on the code side), it is possible to create a job like this:

gui tests with features

In this job, all the tests are run concurrently, but when any of the groups fails we exactly know which business area is affected and we can make decisions about the quality of the current build.

Known defects or unknown ?

Very often – I would say too often – there is a situation, some defects exists but due to their priority or some other reasons they are not fixed immediately. They are present for very many builds and mark their presence with red colour. It completely blurs the image about the state of the application: is it ok? Or not? Does the red colour mean we have new defects or maybe these are known ones?

The solution of this problem is to introduce mechanism to mark the known failures in the pipeline software and to set the status accordingly. In Jenkins, it is possible to set status to yellow (unstable) or red (error). We need piece of software which scans through the reports (either Junit or testng report files) and puts the defect number inside the file on one side and set status accordingly on the other: when all the failures are known the status becomes yellow and if there is at least 1 unknown failure the status is red. Because the defect number is put inside the report file we can view it on the report page in Jenkins.

I didn’t find any out of the box solution for this so I created small application myself.

In details, let’s say we have junit report – we need to parse it for example with the help of great library JSoup I described here. We need to have some file containing mapping infromation that is class/method which is already reported and the ticket number. The code parses the junit XML report and substitutes all the known class/method occurences with the same name but with ticket number appended:

Because of this ticket number can be displayed in Jenkins GUI. As the information about number of known failures is known the status turns green if no failures are found, yellow if only reported failures are found and red if there is at least 1 unknown failure in the report.

The final pipeline

After all these ideas are implemented we can see our final pipeline:

final pipeline

It is now squashed. It consists of atomic jobs which can be run as seperate units (for example for testing purposes). It gives fast and clear feedback. It is reliable and significantly improves the quality of the application under test.


I described here actions I am taking when dealing with the pipeline problem. I am sure this approach is universal: try to make pipeline reliable, clear in feedback and as fast as possible.

JSoup defence for Selenium

Selenium problems

It may happen, Selenium causes problems. There can be at least 2 major things:

  • Selenium test case lasts very long

Selenium request is really expensive. Especially when executing the tests through hub-node (selenium grid) infrastructure. The request is sent from the build machine to selenium hub, then to selenium node, then on selenium node to the browser. The response travels all this way back. When the performance problem shows up during test case execution, the cause is often related to the fact Selenium sends too many requests. We often are not aware how often requests are sent.

  • the page which is to be automated is getting refreshed often

Sometimes we need to assert the page which is automatically refreshed in specific time interval. This problem causes notorious StaleElementReference exceptions. When refresh event happens after Selenium grabs WebElement but before it invokes a method on it the exception surfaces.

 The real life problem

Recently I was dealing with such a problems and was trying to think of a solution.

In my case I was iterating through the table to assert the specific cell in each row.

So the code was more less like this:


The performance was very poor and staleness problem was present in almost each run.

Notice, java is sending requests to web browser at all marked lines. Surprisingly, it is the case for every iteration in the loop as well!

When page refreshes during driver chain method, you will get stale element reference exception. The same thing will happen when page gets refreshed anywhere during loop execution. The list of web elements which is used during the test cannot be refreshed until loop is completed!

How to solve such a problem? The solution is either to try to catch the exception so that processing starts at the beginning of the refresh interval or to decrease number of Selenium requests to minimum and move processing to memory as much as possible. The first solution turned out to be impossible as the loop was lasting 3 times longer than page refresh rate…

Here comes the cavalry

JSoup ( is the ultimate solution for all such a problems. Not only it is great library extremely easy to use with great documentation and intuitive methods but also it allows to extremely smooth code refactor because of the fantastic feature it supports: CSS selectors.

Just take a look:

The table is extracted using Selenium and then the processing is passed to JSoup completely for the looping time:

  • JSoup creates document of the html table, which is kind of snapshot of the data present at the time document was created which assures data consistency
  • the document is then queried using CSS selectors – completely offline from Selenium point of view and entirely in memory
  • the result is converted back to Selenium WebElement to continue Selenium methods

Now, the web browser interaction is reduced to only 2 places.

The solution is staleness proof and significantly improves execution performance: one just needs to catch StaleElementReference exception when Selenium is in play:

The only thing to consider in this specific example is to decide if we can accept the situation page was refreshed after grabbing the table but before sending getLocation request. Notice, it is perfectly save if there is no page refresh at all.

As for performance, even using local web browser and very small table the difference is noticable (on selenium grid the difference is really huge, believe me!):

Sum up

If there is a problem with multiple Selenium requests which cause performance issue or are making tests unreliable because of StaleElementReference exception – switch to offline processing with JSoup. Just remember, you need to understand the number of Selenium requests in your code, the exact cause of staleness and the impact offline processing brings to your test case consistency.


The Great 4 Variables and how to use them to tame the heavy tests instability

The Great 4 Variables

In world of testing we just need to care of the 4 Variables. If we can test all the combinations of values they can have we can be 100% sure our software works. Unfortunately, they are really big ones:

  • code
  • configuration
  • data
  • environment

The code is where we focus most often: unit tests, integration tests, system tests. We now the stuff.

The configuration: is less obvious variable. Most often system under test is using default configuration and we miss important aspect of it. We definetely should create and use configuration tests to learn if the configuration is actually working and how it affects the system.

The data is the real nightmare. The infinite number of possible combinations both internal data (application’s database) and external (data coming from outside) can be spoiled, corrupted or just unsupported is turning the risk of the system under test to fail into the sure thing.

The environment is very much underestimated thing: there are really strange OS configurations, versions and patches which can magically render our fantastic application unusable.

This is a real source of all the variety of defects we encounter when assuring the quality!

I do not want to be writing more details about the 4 Variables, I can recommend the great book which discusses the topic in detail (and other topics as well):  Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation

The actual topic I want to cover in this post is the problem of heavy tests instability, and I want to use the Great 4 Variables to solve the problem.

The problem of heavy tests – pass, fail, error

We all know heavy tests. We often refer to them as system or large tests. They need application runtime. They start slowly, work slowly and most often after long period of time instead of saying pass or fail they just say ERROR. Welcome to the world of passfailerror 🙂

Everything related to the heavy test is slow: it is slow to create it, it takes much time to run it and finally it takes very much time to find the reason of error or failure. If we could only reduce error rate to minimum…

The solution

Let’s imagine we are in the project with many heavy tests which often produce error result. We need to make small steps to decrease the unwanted status. Firstly we need to know the reason of test errors and we need to be able to get this information as quickly as possible when looking at the test result.

Let’s use the 4 Variables for that purpose.

We just need to realize, our testing code is just the same application as any other one. It is also affected by the Great 4 Variables. To execute the heavy test we need the testing code, testing configuration, test data and test environment. We are going to be asserting things in some way and assertions may contain defect. We may be configuring the test in a way it doesn’t work or produces error results. Our test data may be changed by previous test run and thus produce errors or even worse: not credible pass/fail results. Finally our test environment may be malfunctioning or just not working at all (the simplest example is when using docker’s container which is set up at the beginning of the tests unsuccessfully and thus doesn’t work when test starts).

But how can we apply that knowledge? We just need to test all the Variables BEFORE test is started to have confidence it will produce credible result of pass or fail and will not produce any error.

  1. test code – this is going to be real life scenario: I was using Cucumber testing framework as DSL in the project with some nice Selenium back end. The tests were quite stable, the results were clear. What a surprise – for few days they were completely wrong. Because of the defect in Cucumber layer they were reporting PASS no matter what the real result was. The lesson learnt is: always create unit tests for the testing code!
  2. test configuration – when the testing code should enter some special state for the specific test suite we can test if the state is correct. For example, if we have some DSL which has configuration for the application end point is uses (REST, custom client, GUI) it is possible to test if the test code is in GUI mode when GUI test starts.
  3. test data – when there is any chance the test data may be corrupted before testing starts it is neccessary to test if it didn’t change. Again, the real life scenario is I was working with the heavy tests which in certain circumstances when application under test was interrupted abnormally were corrupting the test data. So, if the test data is not safe for some reason it is good idea to check if it is correct before test starts.
  4. test environment – this is very well known issue test environment is down. Either docker container failed to start or maybe the problem is caused by the fact the test environment is used by other teams which planned the machine restart just when our tests start… But I would say we can not only test if it is up, but also do more precise checks like if it has sufficient resources to run the test (memory, processor, disk) or if the test user has some needed permissions for the test to start.

Once we have the checks in place they should be visible as separate items in our pipeline so that we know if our heavy test started at all and if not, what was the reason it failed to start: was it the code, configuration, data or environment failure?

After this we can make improvement actions so that is doesn’t repeat again. We do not loose any time to analyse long log files just to realize after 15 minutes the red status doesn’t mean our application under test has defect in critical area but that the test environment just ran out of space.

Of course, when any of the Variables is not problematic in our pipeline and never cause any problems we can safely skip it. We just need to monitor the Variables which are causing the instability of the heavy tests.


When creating the pipeline to have continuous delivery or at least continuous integration in place we cannot afford to loose time on understanding the results. The message has to be clear for the small, medium and large tests at least for the fact if we actually are having a defect or not. We need to know it instantly.

It is hard to achieve especially for large/heavy tests. In my opinion the best way to solve the problem is to test the Great 4 Variables which are concerning our tests to filter out any failures which are not related to application under test.

Let’s use the saved time for more important tasks.





Improving coverage – automating state transition approach 2.0

Approach 2.0

(This is improved approach in comparison to the idea of state transition testing described HERE.)

I was presenting in my past articles the approach which allows to generate test cases for combinatorial problems. This is very large group of aspects we encouter when dealing with problem of assuring the quality.

Still, there is an area where we need more general approach. Let’s think about simple GUI application which allows you to log in and fill in some form which can be saved. We have combinatorial aspect when filling in and saving it as we can do it in many ways.
But what about the situation we just cancel form filling? Or else we will fill in many form in the row? As you most probably know state transition diagram comes in hand. This is test design technique focusing on most general aspect of the application which is application state and transition context.
I would like to show in this article how to practically model application using this technique and most importantly how to automatically generate test cases with specific coverage which will be instantly executable.

State transition diagram coverage

Speaking about the coverage: according to my idea the coverage for diagrams is basing on how many times each transition is used: I call it N-tn coverage. So when I say 2-tn coverage it means each transition which is present in the diagram will be used at least twice. It is worth to notice this is something different in comparison to what you can find in QA literature where you can find N-switch coverage. As you probably now, 0-switch coverage means you test single transition (no states), 1-switch coverage means you test 2 transitions (the piece of diagram with 2 transitions and 1 state) and so on. This is nice but I think hard to use in pratice. Why? Because to test specific part of diagram you have to render the application into specific state: you have to execute all the states and transitions which lead you to the state you choose as starting point (setting the state of application without executing the path – like updating database, caches and other stuff manually – is very risky in my opinion and should be avoided). It is just better to avoid complex setup process.

The complexity

Unfortunately the complexity which is hidden behind the diagram is enormous: it is actually infinite. Let’s imagine application which has only 2 states and 1 transition:

simplest diagram

simplest diagram

How many test cases can we have? Infinite…
1. A-B-A (1-switch coverage, 1-tn coverage)
2. A-B-A-B (2-switch coverage, 1-tn coverage as T2 is used only once)
3. A-B-A-B-A (3-switch coverage, 2-tn coverage)
4. A-B-A-B-A-B (4-switch coverage, 2-tn coverage as T2 is used only twice)
5. A-B-A-B-A-B-A (5-switch coverage, 3-tn coverage)

and so on until infinity is reached which is never of course…
Repeating transition once, twice and thousand times are all different test cases. Here you can clearly see how many test cases you miss to reach 100% confidence your application is working as expected.
Anyway, theory is very nice but let’s apply it in practice to make it useful finally.

Practical example

In general, we need just the same as what was the case for combinatorial problems: we need a model, generated test cases in xml format and generated test cases in domain language.
Let’s assume we would like to test Notepad’s functionality related to tabs and text direction. Let’s start with plain old diagram:

state transition diagram example

state transition diagram example

It looks very nice but we cannot do anything useful with it right now. Let’s write it in XML format with domain language part:

Now it is becoming unreadable for humans but it is much better for a machine…

Please note, expected results for each transition is “notepad GUI is visible” which is quite trivial. This should be more meaningful when doing real diagram model.
At this point we need some software to find valid paths through the diagram with given N-tn coverage. I couldn’t find anything useful in the internet so I wrote myself the piece of software. You can view the code under automatic-tc-generation-from-diagram-another-approach branch HERE. This is: src/test/java/com/passfailerror/diagram2sequence_generator/ class.

The algorythm is quite simple:
– diagram is converted into state transition table
– starting state row is chosen as 1. item
– state transition table is shuffled and scanned; when matching row is found (according to diagram logic) it is appended to valid path and transition table is reshuffled
– this process repeates until diagram path is built with specific N-tn coverage (each transition is visited at least N times)
– notice, it makes sense to generate more than one diagram case as each time specific N-tn coverage is generated different path is chosen.

After running Diagram2SequenceGenerator there is result XML file which I call diagram cases generated which we need to convert into executable diagram cases as it is not executable yet:

We can do the convertion with enhanced version of testcase generator which was used in my previous articles which were dealing with combinatorial problems. You can see the source in :
src/test/java/com/passfailerror/testcases_generator/ class which receives extra parameter TestcaseSourceType which in turn allows to generate test cases both for TCases output file and diagram cases output file.

The executable test case is:

The sequence of WHENs and THENs is diagram case, while single WHEN-THEN pair would be a test case according to my terminology. Just to repeat: it is valid to have more than one diagram case for given N-tn coverage as the sequence of transitions which is generated is always different.
Now, it is just the matter of running the output as it is directly executable:

Approach 2.0 sum up

First we draw a diagram:



Then we translate it into XML (unfortunately manually):

diagram as XML

diagram as XML

Then diagram cases are generated (automatically):

generated diagram cases

generated diagram cases

Finally executable test cases are generated (automatically):


generated DSL executable test cases

The DSL which is used here (internal domain language implemented in Java) as well as framework (Sikuli) doesn’t really matter. They are used only as an example. Most often it is Selenium, or maybe some kind of strange things like Protractor which will be used in practice and Cucumber or other behaviour driven development library on top of this. The most important thing is that when using approach 2.0 the only important thing is to use any kind of domain language so that it can be used in HAS elements in model file in order to generate diagram cases automatically.


State transition diagram test design technique starts to be useful finally – I have never seen anybody applying this in pratice which is weird as this is about all applications which have at least 2 transitions. Or maybe I didn’t see much?
There are a few important points behind all this: it was very simple problem illustrated here where only few states and few transitions resulted in so many actions. It means the complexity hidden behind simple application is very large and so when modelling more complex applications we have to focus on small coverage or choose only part of application to be tested in this way. Also, I didn’t say anything about invalid paths through the diagram: we should also be checking if invalid paths are really invalid and how system behaves in such situation.

Anyway, I am sure this is very useful technique to deal with problems which are modelled by state transition diagrams.


Automatic test case generation for state transition diagrams (approach 1.0)

Approach 1.0

This article is left here for historical reason. Please read newest version of the idea which is described HERE.

Increase automatic test case generation

I was writing about 2 things in the past: state transition based testing and automatic testcase generation. This is actually about 2 complementary test design techniques: state transition diagrams and decision tables respectively (I do not want to write about details of these techniques now – this is a subject for separate post I hope to write in the future). In the latter post I showed how to automate test case generation for decision tables, the goal for today is to show how to start automation when diagram is the starting point.

Combinatorial nature of a problem can be expressed as decision table and can be translated into xml for TCases application to process it and produce output which contains optimal set of test cases (automatic testcase generation). However, the most general way to analyze application under test is the state transition diagram. I already showed how to use this technique in order to achieve the coverage but I showed only the manual approach. Still, we need automatic test case generation!

When diagram is in use, the trouble begins: how to process it automatically? How to generate set of test cases from a diagram? It was quite a while until I came up with some reasonable solution.

I recently thought I could try TCases for this purpose. Although this is meant to identify variables and its values, if transitions of the diagrams could be considered as variables and their dependencies were described in TCases xml input file, I could get valid set of transitions and each transition would be used at least once in basic coverage setting. 

Practical example

Create model

Let’s use the same problem as in state transition based testing. We want to test if Notepad is working when switching between tabs and changing text direction inside each of them as well as writing text in each of them. This is very simplified model but it is enough to ilustrate the concept. The state transition diagram looks like this:


Now, it is required to translate it into XML representation which will be parsable for TCases (I was writing about TCases HERE). This is it:

INPUT is the state name, VAR is the transaction.

COMMAND in HAS elements contains domain language sentences which are executable after simple processing by domain language generator.

WHEN elements describe needed dependencies to allow only valid combinations of transactions.

EXPECTED in HAS elements shows we just assert if Notepad GUI is visible after each set of transactions is run.

There is one problem with this file: in line 16 we need to give all the sequence of transactions needed to reach SELECT2TAB as TAB_1_IS_SELECTED state has 2 outgoing transactions. This shows there is a disadvantage of modeling the diagram in this way if there are states using very many transactions.

Generate executable test case

After generator is run, the set of test cases is produced. Generator reference is

The link to the source code is shown at the end of this post if you are interested.

When using basic coverage which is 1-tuple coverage it will mean each transaction will be used at least once. Because each transaction is marked as TRUE or FALSE (decision about transaction is valid when dependencies are met) the set of transactions will contain both TRUE and FALSE: it means in the generated test case there can be all valid transactions but also part of them as well. This is 1-tuple coverage:

With generated test cases (tc3 is missing as it consists of FALSE values only and generator wisely skips such test cases):

Now, when creation process of test cases is automated it is very easy to increase the coverage. This is 2-tuple coverage:

With generated test cases: