Write your own pipeline testing framework

Introduction

I had a need recently to start using pipeline testing framwework in order to be able to test number of pipeline files I am using on daily basis. I tell you, it is bad feeling when number of pipeline scripts is growing and the only testing you can do is to make some sandboxing inside Jenkins…
So I started to look around and at first I found this framework for testing which looked quite promising:
https://github.com/jenkinsci/JenkinsPipelineUnit
Unfortunately, almost immediately I stumbled upon problem of malfunctioning mock for local functions: I was not able to mock internal function – it was only possible to mock steps, so I would have to mock all of the atomic steps of all auxiliary functions I was using. Imagine function with 50 steps which you just need to mock but you cannot do it and the only workaround is to mock most of that 50 steps…
For me lesson learned is: open source solution is good when it is working propery or all defects can be worked around easily. Project has to be very active I guess with lot of developers involved. This is not the case here – I would guess this even may be abandoned in the future.
Anyway, because workaround was hard and I found yet another problem with that framework, I decided to try to create my own solution.

Understanding the problem

When looking at typical declarative or scripted pipeline one can see it reminds groovy. Let’s look at the example:

Trying to launch simplePipeline.groovy outside of Jenkins as regular groovy script ends up with error like this:

So, the problem with running it with standard groovy is for sure pipeline. But what does it mean in terms of syntax? Well, it is just a function named pipeline which accepts closure as parameter:

pipeline { ... } 

The other items are other groovy functions as well, they just have 2 arguments like:

stage('Second stage') { ... } 

This one is stage function which has String as first argument and closure as the second one.

It is clear now that our goal is just to mock all pipeline specific functions so that they are runnable by groovy. This will allow to run whole pipeline file outside of Jenkins ! It sounds complicated but it is not. Read on.

Making it runnable

Let’s see the example of how can we write first version of code which will run pipeline example:

If you run it you will see it works now! All jenkins related syntax items are mocked and so groovy can execute the file. How does it work?

I use expando class mechanism here which is part of runtime metaprogramming in groovy (You can read about it here:

https://www.groovy-lang.org/metaprogramming.html#metaprogramming_emc).

In highlighted lines you can observe how “pipeline” function is mocked. It says: whenever you meet function named pipeline with matching signature (which is array of one or more objects), run the assigned closure (run the code in curly braces).

So, during the runtime the code is run and it means that the closure:

  • logs word “pipeline”
  • treats first parameter as a closure and runs it (which means in this example that “agent” function will be run and so next mock will be executed)

This is the core of the pipeline testing framework: mocking jenkins syntax which is expressed in pipeline file in curly braces as closures.

Of course not all of the syntax items can be mocked this way, as for example there are functions like echo which have String parameter only. Their mock is just printing it out.

Improving the solution

The first version contains much redundancy. We can do some improvements here:

It looks much better, we separated steps and sections and only one piece of code is responsible for mocking each of them. We can also easily add new mocks now. We even have here this crazy feature of dynamic method names. It is awesome!

However, we still cannot do any assertions. Let’s solve this problem as well. Firstly, we need to find a way to trace the script during the runtime and store its states in some memory structure. We need to know for each step in the pipeline:

  1. the caller hierarchy (which syntax item called which one, “did stage 1 called echo ?” )
  2. what was the state of global variables (“was env.TEST_GLOBAL_VAR set in “First stage” and did it have value “global_value” in the “Second stage”?)
  3. what exactly was called (“was echo step called with “test1″ parameter ?”)

Ad. 1 We can achieve by extracting information from stack trace using for example these 2 methods:

Numbers of lines from pipeline file are caught and translated into text – we know what is the caller hirarchy now.

Ad. 2 We can get information about state of global variables by using: pipelineScript.getBinding().getVariables()

when mocking.

Ad. 3 To get exact syntax item with parameters we can report them during mocking as well.

This is example for 2 and 3:

Method “storeInvocation” is called with 3 parameters:

  • syntax item name
  • its parameters
  • current state of variables

In this method we have 2, 3 coming as parameters, and 1 created by “createStackLine” function. I store it in ResultStackEntry class and name it: 1-stackLine, 2-invocations and 3-runtimeVariables.

Take a look on github for details.

Putting it all together, this code:

produces stack trace in the output:

The last important thing here is global environment handling. In Jenkins, there is environment map named ENV where keys can be set by assigning values like this: env.TEST_VARIABLE="test1"

Groovy by default doesn’t know anything of such map, so it is neccesary to prepare it so that pipeline can populate it during runtime:

Assertions

We have working framework with result stack but we still cannot do any assertions. We have to add this ability to be able to test anything, don’t we ?

Let’s add some result stack validator (look at the Github for full code):

And assertion class to be mini DSL for asserting result stack when writing tests:

Only now can we start doing some assertions:

E voila. So finally we have very simple working example which can run simple pipeline and assert simple things.

We can use as starting point for real framework with mocking all aspects of pipeline file: sections, steps, functions, objects and properties.
This is all possible with using just 2 aspects of metaprogramming in Groovy: expando metaclasses and binding.

Full code:

https://github.com/grzegorzgrzegorz/pipeline-testing/tree/master

Interesting links:
https://www.groovy-lang.org/metaprogramming.html
http://events17.linuxfoundation.org/sites/events/files/slides/MakeTestingGroovy_PaulKing_Nov2016.pdf

Pipeline design trick

In the previous article I mentioned about attention we need to pay to the resources we have when flattening the tests. Still, the available resources end at some point.

Let’s imagine we have a pipeline with quick/small tests launched after each commit. As new features are added, they get longer and longer. We want to be sure our software is working so apart from unit tests we add integration/medium tests as well as heavy/large part doing end to end stuff.

diagram 1: small, medium and large tests which last 1.5 hour

We try to flatten the tests and start large ones as fast as possible but they are so slow it doesn’t help. We try to run large tests phase every day once in the evening but then we can only learn the next day something is wrong with the software which is not what we want.

diagram 2: flattened tests which last too long

 

diagram 3: separated large tests run in the evening once

What can we do? How can we quickly get feedback about the application under test while having all of the tests executed? Quickly – means something between 20 or 30 minutes and not hours of waiting…

We can find a way to the solution by not analyzing the pipeline anymore but try to look at it from other point of view: the feedback point of view. What is the group of people who needs the feedback then? What will happen if the feedback for some area of the application will be delayed?
Most often there are teams in the company which deal with specific parts of the application: team 1 will do module A and B while team 2 will do C, D and E. Let’s imagine module B is the one involved in large tests. It is tested by many unit tests but then undergoes long phase of end to end tests which last for example 1 hour.
We need to separate large tests phase just as we did when running it once a day, but trigger them synchronously with small tests and making sure it will use the same build artifact as small and medium tests do. The triggering of the large tests phase has to be done only when they are not in progress to prevent the tests queue to grow indefinetely.

diagram 4: large tests as separate pipeline triggered synchronously

Thus, we get feedback as fast as we can for small and medium tests (every 30 minutes) which allows team 2 to keep the pace. Team 1 also benefits as it gets feedback about module A very often. It gets feedback about module B as fast as possible that is about every 75 minutes if testing cycle is active all the time.
If our pipeline is continuous deployment one, we need a trick to allow us to deploy only fully tested code as some of the commits will not be tested by large phase at all. This can be done by tagging the code by each of the test phase (small/medium and large) individually and letting only fully tagged commit id to be deployed.
The only price we pay is that we do not know what is the full test result of every single commit. We get the full feedback about green and yellow tests but red ones are only known sometimes. The runs with uncertain large tests result in diagram 4 are run numbers: 2,3,5,6 while 1 and 4 are fully tested and may be safely released or deployed. I am sure this is very small price for advantages we get.
This idea may be used to divide the pipeline into even more phases if there are more of them which do not fit into 30 minutes basic tests duration. More large tests or maybe some additional medium ones. All of them may be triggered when small tests start under condition they are not in progress and all of them may set individual tags to mark the commits they were testing to be able to always track commits which are safe for release. We have separate feedbacks for all of the phases done as fast as possible.

It is better to have all the tests be very fast and finish after 20 minutes but if this is impossible the above example approach may help to keep both the rich and quick feedback in place.

Few recipes for declarative pipeline

It is often required to iterate over some set of names using sequential or parallel stages.
The good example is when we have list of modules – possibly retrieved automatically – which we want to test, sequentially or in parallel.

1. Sequential example.

Let’s look at sequential example:

First we just build the application without executing any of the tests.

In my opinion tests should be always separated from the build process – even if they are unit tests. The task seems to be easy but it requires some attention.
Firstly, we need to use scripted pipeline approach. Thus it is possible to use for loop. Each of the stage gets generated test name.
Secondly, we need catchError step. If not used, the pipeline would abort on first unsuccessful iteration, while we want all iterations to execute no matter the status is.
Thirdly, after each iteration we need to preserve the surefire output result – testng in this example – so that it can be archived properly.
All of the stages of this pipeline are executed using the same node and the same workspace path.

2. Parallel example.

Making the pipeline parallel I call flattening. Let’s see the example:

 

The key to understand more complex pipelines is to understand where each stage is executed in terms of workspace path and machine. In the example here all the stages which are generated are executed in parallel as parallel keyword is used. It means there will be different workspaces used for each of them (they are executed on the same machine): workspace_path, workspace_path@1, workspace_path@2 etc.
(it is also possible to configure the stages, so that they are executed on different Jenkins nodes).
Thus the pipeline needs to first stash the build artifacts after BUILD stage and then unstash them in generated stage. Stashed files are stored in Jenkins master to allow accessibility to all of the stages no matter their location.
In this example we do not need catchError step as even when any of the stage fails, the rest will still be executed. Testng report file will not be overwritten (all of the reports reside in different workspaces) but it is good practice to rename it so that Jenkins can handle it properly when sending them to Jenkins master for report to be generated. In this example all the reports are gathered in one place and reported in one go while log files are archived after each stage finishes.

3. Summary.

It was just a few simple recipes of the pipelines. There is of course much more: we can split the pipeline not only into stages but also into jobs. BUILD stage could be separate job in the above example. We can do more tasks in parallel than just tests like multiple checkouts, multiple builds. The parallelism can be related to either stages or jobs.
The more things are flattened the more attention we need to draw to the resources we have: number of machines, number of Jenkins nodes on each of them in comparison to available memory and processors, disk read/write speed.
It is required so that the pipeline speed increases with flattening process and not the opposite.

Declarative pipeline

Pipeline

I am working nowadays on a pipeline improvement. This is absolutely crucial part of QA process in the project. Good pipeline is the core around which we can build all the things which constitute high quality.

When starting new project, create pipeline first!

But what does it mean a “good” pipeline? It has to be reliable, informative and it has to be really, really fast. That’s it.

What’s that?

Pipeline is the sequence of actions which should start with the code build and end with the tested build artifacts. There is a feedback about the code: is it OK or not OK? In case of OK message, the artifacts may be just deleted at the end altogether with test environments (continuous integration), may be released (continuous delivery) or – the most interesting option – may be deployed to production (continuous deployment).

There are strategies about the code management: for me the most basic classification is branchless development (some info HERE) vs branching one (well you know this one for sure). In case of the latter (most popular I guess) the successful pipeline allows the code to be merged into main branch, while unsuccessful pipeline prevents the defective code from merging anywhere. In case of the branchless approach immediate actions need to be taken as the code which doesn’t work as expected is already there in your code repository.

The pipeline can be implemented using dedicated software as BambooTeamCity or open source Jenkins.

Jenkins

I will be describing Jenkins in this article as I have most experience with this application. As usual it is a piece of software which has advantages and disadvantages. If I was to name the biggest one item of both:

  • biggest disadvantage – disastrous quality of the plugins’ upgrades: one can expect everything after the upgrade: usability change, feature disappear, old bugs come back (regression) etc.
  • biggest advantage – great implementation of agents: each agent can have many executors which are very easy to configure; the scalability is therefore fantastic and allows massive number of concurrent jobs to be run as each of the job requires one free executor only; commercial competition doesn’t use executors and so 1 agent always means only 1 job at a time: it must be causing much bigger overhead when designing pipeline with many concurrent jobs on 1 machine in my opinion, as many agents are needed while Jenkins can use only 1 per machine.

The practical problems

I mentioned about reliability, information and performance.

  • reliability – pipeline has to work in the same way every time it is launched; no IFs, no unexpected stopping, no deviations in terms of duration of the pipeline as well as amount of feedback information, the general goal is we have to be confident about the pipeline itself
  • information – pipeline has to provide clear information what is working and what is not; clear information means we do not use any more than few seconds to know if the problem is related to module B of our Java application or maybe security problem with “user management” feature
  • performance – pipeline has to be as fast as possible; the larger project the more hard it is to achieve, for me the nice result which I call “fast” is 10 minutes for build and all unit tests and 20 minutes for all the heavier tests (integration, GUI etc); total pipeline runtime of 15-20 minutes is fantastic performance, 20-30 is acceptable but improvements need to be found, >30 is poor and improvements just have to be found; it’s a psychology behind this: let’s do it 3 times per hour or at least twice: if we are waiting 35 minutes the magic disappears

It all sounds simple but it is really hard to achieve. In practice there is often a counterpart of the list above: pipeline gets stuck at random places (application doesn’t install on 1 environment?), it gives red/yellow/green statuses but no one knows what are they tied to (can I release the software or not? is it ok?) and finally it lasts for 2 hours which feels like forever (let’s commit only few times a day otherwise we will not be able to see the result – developers think).

There are many reasons of this:

  • people’s mindset – they discuss every character in the code but forget about the pipeline
  • pipeline software specific problems – in Jenkins there can be 2 kinds of jobs: created in GUI using a mouse clicks and stored inside Jenkins application or written in so called declarative (or older version: scripted) pipeline stored in files in CVS. If the jobs are GUI based they are impossible to maintain for a longer time as their number grows and the relations between them get more complicated. They start living their own life.
  • traditional approach to the sequence of stages in the pipeline (all the stages are run in sequence)
  • doing one universal job which does many things
  • relying on software project management tool only (like Maven), to implement test concurrency and their reporting without using the features of pipeline software
  • physical environments availability
  • physical environments configuration

Fixing it (in details)

I would like to show some details in this section how to try to fix all the problems.

Squashing pattern

Let’s start with the general image of the pipeline. If there are stages/jobs run in the sequence like this:

stages in the sequence

We need to squash it as much as we can, that is:

squashed pipeline

We need separate build stage to be able to use build artifacts for all the testing stages we need to do. We just need to build only once and use the output everywhere we need.

For Maven, it means the transition from notorious:

into 2-step:

1. line is then placed inside Build stage and 2. line is a Unit tests stage.

All other tests start at the same point of time as Unit tests. Of course, heavy tests like GUI require application redeployment. So the stage has to have separate section for redeployment, like this:

 

gui stage

But what is the redeployment step? We have numerous problems when dealing with this step very often. To achieve the stability we have to switch to virtual solution – Docker. Only then can we have the control on this step. If you do not know it yet for some reason read about it, otherwise start using it. It greatly improves the stability of the pipeline:

docker job

But what about the situation we need numerous redeployments for e.g. different applications configiurations? Then we just take advantage of parameters each of the jobs have. In Jenkins, each of the job consists of the stages. Each job can also receive and pass parameters. Again we modify the pipeline by introducing new job:

create container

Now, for each of the required application configurations we have instance of the create_container.job running.

As you can see, we have now stage for each logical step we make in our pipeline.

In general, the pattern is to drill down to the most logical atomic steps to visualize them in pipeline software and squash them as much as possible to gain performance.

Pipeline as code

In Jenkins, as I mentioned above, there is a possibility to click the jobs in GUI. Just do not do that! It is the dead end. The jobs become impossible to maintain very quickly and you are completely stuck. Use declarative pipeline instead.

In Jenkins you use Groovy to create scripts in a simple way like this:

The point is to have pipeline expressed in some language and store it in control version system (CVS). It applies also to any other software used for pipeline: if there is a possiblity to store pipeline as code, do it, otherwise change the software !

 Moving away from project management tool like Maven

Let’s come back to the unit tests step. Unit tests are important stage but often long lasting one. Although we have separate stage for unit tests so far, we still have room for improvement there.

The usual attempt to increase the unit tests speed is:

However I think it is wrong approach as this is moving the concurrency functionality away from pipeline software to Maven. Also, the performance gain is very much questionable. The better approach in my opinion is to ask pipeline software to do the job. It is required just to get the list of modules we need to test. We can do it using Maven:

We need to create a job where 1. stage will be module list collecting stage and 2. stage will generate another stages for each of the individual modules. Here is a full example:

This is a mixture of declarative pipeline and scripted pipeline which allows dynamic stage generation.

In the GET_MODULES stage we extract all the modules from our application and store them as string in env.MODULES variable (this is saved as string and then tokenized to create array). In the INVOKE_CONCURRENT_TESTS stage we generate the stages using runConcurrent method. Each stage will be named using module name.

There are 2 things we need to pay attention to.

The 1. one is workspace. Each generated stage has its own workspace so it cannot see the project which should be tested. To point each stage to the right one, dir directive has to be used represented here with env.BUILD_ARTIFACTS_PATH. I think the best pattern – used here – is to keep workspace with build artifacts available for each job (so that it can be accessed from any agent by dir directive). If this is not possible the stage which does the build needs to archive artifacts to Jenkins master and then each of the generated stages would have to copy them from there. This affects performance of the pipeline very significantly.

The 2. important thing is maven repo set by -Dmaven.repo.local parameter. It is easy to confuse things and run into trouble when default maven repository is used. Especially when building different software versions at the same time or building 2 different applications it is good thing to have dedicated maven repository for each of them.

Anyway, we have now each of the modules tested by separate stage and the duration of the tests is equal to the duration of the longest lasting module:

unit tests job

We can do it this way as we are dealing with unit tests – unit is something which is independent and so our tests must be running properly in this way. The huge performance gain is one point, the other is very clear feedback about the point of failure if one occurs.

Speaking about the clear feedback…

Clear feedback

The goal of this is to get to the point where pipeline provides clear information what is working and what is not. We have it unit tests job already. If module B fails we will know it immediately on the screen as there is separate stage dedicated to each of the modules.

But what about GUI tests? When the job fails, which feature is not working? We need to know it immediately as well, no time to search the logs! The answer is to group the GUI tests in a way that each group represents one of the business functionalities. Let’s imagine application under test has 3 significant features tested: security, processing and reporting. After the tests are grouped (for example in separate packages on the code side), it is possible to create a job like this:

gui tests with features

In this job, all the tests are run concurrently, but when any of the groups fails we exactly know which business area is affected and we can make decisions about the quality of the current build.

Known defects or unknown ?

Very often – I would say too often – there is a situation, some defects exists but due to their priority or some other reasons they are not fixed immediately. They are present for very many builds and mark their presence with red colour. It completely blurs the image about the state of the application: is it ok? Or not? Does the red colour mean we have new defects or maybe these are known ones?

The solution of this problem is to introduce mechanism to mark the known failures in the pipeline software and to set the status accordingly. In Jenkins, it is possible to set status to yellow (unstable) or red (error). We need piece of software which scans through the reports (either Junit or testng report files) and puts the defect number inside the file on one side and set status accordingly on the other: when all the failures are known the status becomes yellow and if there is at least 1 unknown failure the status is red. Because the defect number is put inside the report file we can view it on the report page in Jenkins.

I didn’t find any out of the box solution for this so I created small application myself.

In details, let’s say we have junit report – we need to parse it for example with the help of great library JSoup I described here. We need to have some file containing mapping infromation that is class/method which is already reported and the ticket number. The code parses the junit XML report and substitutes all the known class/method occurences with the same name but with ticket number appended:

Because of this ticket number can be displayed in Jenkins GUI. As the information about number of known failures is known the status turns green if no failures are found, yellow if only reported failures are found and red if there is at least 1 unknown failure in the report.

The final pipeline

After all these ideas are implemented we can see our final pipeline:

final pipeline

It is now squashed. It consists of atomic jobs which can be run as seperate units (for example for testing purposes). It gives fast and clear feedback. It is reliable and significantly improves the quality of the application under test.

Summary

I described here actions I am taking when dealing with the pipeline problem. I am sure this approach is universal: try to make pipeline reliable, clear in feedback and as fast as possible.