Another approach to generate test cases

I described a way to generate test cases some time in the past HERE. Recently, I tried another approach which I put in Github as a real example: https://github.com/grzegorzgrzegorz/testcase-generation. This is also using TCases to generate non-executable test cases from model. More about TCases HERE

The concept of the approach is:

  • create model file
  • generate non-executable test cases (by TCases)
  • prepare variable related classes for deserialization
  • generate test cases by using the code which under the hood deserializes non-executable test cases in order to generate executable code

For example: there is an application which converts text into valid sentence, that is makes first letter capitalized and adds dot at the end. If there is variable called Capitals in the model, there has to be class Capitals prepared. This requires some work, but it allows to separate executable code from model. In the code snippet below there is one of the non executable test cases which has Capitals variable set to “NotFirstLetterCapitalized”:

Information about how to prepare string with regards to capitalization and how to assert it are put into Capitals class:

The generated, executable test case looks like this:

To sum up: the main advantage of this approach is the separation of DSL syntax and model data. There is much more info on Github: https://github.com/grzegorzgrzegorz/testcase-generation.

3 stages of pipeline tests

It is not so easy to develop pipeline at certain level of complication. I was describing pipeline testing framework in one of the earlier posts. It is great tool to catch many defects very fast. However, in my opinion it is capable of doing so for about 50% of them. The rest leaks to production. Why is this happening?

In the past I created a post dealing with the problem of 4 great variables. So here the problem hits us in practice: it is only possible to catch most of the defects related to CODE using pipeline testing framework but none related to ENVIRONMENT or CONFIGURATION or DATA.

What can we do? We need to add extra testing stages which will fill in the gaps.

1. stage – pipeline testing framework

It can catch most of the problems related to code: logic problems, syntax problems and so on. Multiple tests are possible to be written verifying logic paths, variable values, overall syntax correctness and expected communication with other jobs or libraries (names, input parameters etc.).

2. stage – jenkins validation

In this stage we send pipeline under test to jenkins for validation. Two steps are required:

More information on this is here: https://www.jenkins.io/doc/book/managing/cli/ and here: https://www.jenkins.io/doc/book/pipeline/development/.

The same code which passed 1. stage should be sent to jenkins application to run declarative-linter command. It is able to find code problems which pipeline testing framework cannot. These are defects related to Jenkins specific things like mandatory sections required by declarative pipelines. The best example for me is STEPS section which cannot be missing but 1. stage is not able to validate it properly.

3. stage – pipeline draft run

This stage is meant to catch the rest of the defects. While it means this should be able to detect configuration, environment and data problems it needs to be as similar to production run as possible. To achieve it, the following is needed in my opinion for draft run to be useful:

  • it should use the same files as production
  • it cannot influence external systems in any way
  • it should be possible to run it fast

Two first points there can be fixed by creating draft launcher job which sets configuration in a way no interaction with outside world is made (Jira communication, repository communication, user communication are all set to off). Draft jobs are created in Jenkins which are using the same files as production jobs but with modified configuration by draft launcher.

At this point, pipeline draft run is possible but it will take the same amount of time as production one. This is not desired behaviour for sure. We come to the third point here.

This point is hard to do but really important. To make the pipeline run fast, all time consuming operations need to be replaced with some dummy operations. It means for pipeline to run successfully, real data from previous run has to be used.

I am solving this problem in this way:

  • pipeline puts build artifacts to shared disk so that they are accessible by all Jenkins nodes
  • it is reusing them in consecutive stages/jobs
  • at the same time I can reach build artifacts location from the past run and copy it aside to draft workspace to use it in draft pipeline
  • during draft run job which is building application reconfigures its workspace (configuration parameter) to draft workspace just after cloning repository
  • building phase can be completely skipped and few unit tests can be run just to check reporting in Jenkins (configuration parameter)
  • etc.

Conclusion

Having all 3 stages in place gives me high confidence pipeline works properly. Moreover it is possible to develop it rapidly without testing it in production. I get the feedback about current code, configuration, environment and data. I can also change any of this variable values and retest quickly. Fast feedback is the essence of development. We all need it.

Great blog to read and podcast to listen to

I have discovered recently excellent source of information about computer science in general. This is practical knowledge shared by many experts either working for big, well known tech companies or who are known in the industry for their valuable contribution. One can read about real-life stories, solved problems and general thoughts about where we are with computer science and application development today and where are we heading to. Valuable links, important books and practical tips are presented.

I mean: https://techleadjournal.dev/

It says tech lead, but it is not all about leading. There are different areas presented such as very technical aspects of testing , development methodologies, architecture approach techniques or – to name opposite example of the spectrum – providing the content and getting the audience for the blog.

Because of the form of the interview, we can also learn about the personality of the guest which is interesting especially in case of famous persons. I personally read a book of one of the interviewees and it was interesting to hear the story of his professional career. In every episode each person gets few questions about his/her career journey and also about 3 tips for the audience from his/her area of expertize.

Actually it is a podcast in the first place but it has a magic feature of transcript. I suggest to grab it an read it offline on ebook reader or tablet as you can make some notes for future reference.

I really recommend this one. I have seen a few but this is the first one I am writing about.

Do not be fanatic, be flexible.

I met professional fanatic. Who is fanatic?

  • he always knows everything (and he has good explanation for everything)
  • never modifies his way of thinking
  • has zero flexiblity no matter the circumstances

Imagine asteroid coming to hit the earth and cause ELE. Fanatics work on program which should save the earth. It will operate some high energy laser which will hit up asteroid and change its course. There are 12 hours before the impact.
Fanatics start their TDD approach. They are relaxed and self-confident as they know they create highest quality code. Somewhere between 2. and 3. version of the feature branch, after 20 tiny tickets and total amount of about 35 reviews of even tiner mini-branches asteroid hits the earth and turns it into a desert with no life on the surface.
If fanatics have been on the moon, they would have been happy because just when asteoid was hitting the earth, they understood application domain finally and 3. version of the feature branch was meant to be production version. This wonderful feeling you are on perfect path is worth any cost. It doesn’t matter for them their work is useless from that point on.
Here comes the only bright side of ELE: there are no fanatics anymore.

How not to be a fanatic?

It is simple, when there is question which contains word “always” never answer yes:

  • should you always work using TDD? No ! Only when it makes sense. I totally agree with this interesting article: https://techleadjournal.dev/episodes/58/ stating TDD should be applied in the right moment. You need to make some code and only some tests after when you try to understand application domain. Working TDD style from the scratch slows you down by the multiple factor and is waste of time as your work goes to the junk anyway on this stage. Only after you know you are ready to proceed with given feature branch you should start TDD.
  • should you always have pyramid like test levels? No ! Only when it makes sense. If your application for example is some rest api thing which has few functions which do not interact with external systems, you can even have square like test levels shape: number of integration tests could be the same as unit tests. The only disadvantage of integration tests in such case is slow feedback as if they fail you do not immediately know where is the point of failure. But you have your unit tests set which will tell you that and large number of integration tests will save you from regression.
  • should you always create highest quality possible code? No ! Only when it makes sense. When you create a helper tool for analyzing/filtering/reporting some stuff, make a ticket, make a feature branch and a decent review. Check if code follows good practices, if it is clear and most importantly has good set of tests. And this is good enough! Do not spend hours on the review torturing the details – you really can stop at some point before reaching the perfection. It is a tool, like a car wrench. If it is not useful anymore you can just throw it away and create new oone. 

Do not waste your time on:

  • making multiple tickets for simple application
  • making multiple branches for simple application
  • making each commit be atomically consistent

Do not make simple task bureaucratic horror. I am not afraid to say it loud – in general you can just afford some technical debt. It just needs to be calculated well. Very often the cost of delivering late is much higher.
It is better to have something decent on schedule than have something perfect when it is not needed anymore.

Write your own pipeline testing framework

Introduction

I had a need recently to start using pipeline testing framwework in order to be able to test number of pipeline files I am using on daily basis. I tell you, it is bad feeling when number of pipeline scripts is growing and the only testing you can do is to make some sandboxing inside Jenkins…
So I started to look around and at first I found this framework for testing which looked quite promising:
https://github.com/jenkinsci/JenkinsPipelineUnit
Unfortunately, almost immediately I stumbled upon problem of malfunctioning mock for local functions: I was not able to mock internal function – it was only possible to mock steps, so I would have to mock all of the atomic steps of all auxiliary functions I was using. Imagine function with 50 steps which you just need to mock but you cannot do it and the only workaround is to mock most of that 50 steps…
For me lesson learned is: open source solution is good when it is working propery or all defects can be worked around easily. Project has to be very active I guess with lot of developers involved. This is not the case here – I would guess this even may be abandoned in the future.
Anyway, because workaround was hard and I found yet another problem with that framework, I decided to try to create my own solution.

Understanding the problem

When looking at typical declarative or scripted pipeline one can see it reminds groovy. Let’s look at the example:

Trying to launch simplePipeline.groovy outside of Jenkins as regular groovy script ends up with error like this:

So, the problem with running it with standard groovy is for sure pipeline. But what does it mean in terms of syntax? Well, it is just a function named pipeline which accepts closure as parameter:

pipeline { ... } 

The other items are other groovy functions as well, they just have 2 arguments like:

stage('Second stage') { ... } 

This one is stage function which has String as first argument and closure as the second one.

It is clear now that our goal is just to mock all pipeline specific functions so that they are runnable by groovy. This will allow to run whole pipeline file outside of Jenkins ! It sounds complicated but it is not. Read on.

Making it runnable

Let’s see the example of how can we write first version of code which will run pipeline example:

If you run it you will see it works now! All jenkins related syntax items are mocked and so groovy can execute the file. How does it work?

I use expando class mechanism here which is part of runtime metaprogramming in groovy (You can read about it here:

https://www.groovy-lang.org/metaprogramming.html#metaprogramming_emc).

In highlighted lines you can observe how “pipeline” function is mocked. It says: whenever you meet function named pipeline with matching signature (which is array of one or more objects), run the assigned closure (run the code in curly braces).

So, during the runtime the code is run and it means that the closure:

  • logs word “pipeline”
  • treats first parameter as a closure and runs it (which means in this example that “agent” function will be run and so next mock will be executed)

This is the core of the pipeline testing framework: mocking jenkins syntax which is expressed in pipeline file in curly braces as closures.

Of course not all of the syntax items can be mocked this way, as for example there are functions like echo which have String parameter only. Their mock is just printing it out.

Improving the solution

The first version contains much redundancy. We can do some improvements here:

It looks much better, we separated steps and sections and only one piece of code is responsible for mocking each of them. We can also easily add new mocks now. We even have here this crazy feature of dynamic method names. It is awesome!

However, we still cannot do any assertions. Let’s solve this problem as well. Firstly, we need to find a way to trace the script during the runtime and store its states in some memory structure. We need to know for each step in the pipeline:

  1. the caller hierarchy (which syntax item called which one, “did stage 1 called echo ?” )
  2. what was the state of global variables (“was env.TEST_GLOBAL_VAR set in “First stage” and did it have value “global_value” in the “Second stage”?)
  3. what exactly was called (“was echo step called with “test1″ parameter ?”)

Ad. 1 We can achieve by extracting information from stack trace using for example these 2 methods:

Numbers of lines from pipeline file are caught and translated into text – we know what is the caller hirarchy now.

Ad. 2 We can get information about state of global variables by using: pipelineScript.getBinding().getVariables()

when mocking.

Ad. 3 To get exact syntax item with parameters we can report them during mocking as well.

This is example for 2 and 3:

Method “storeInvocation” is called with 3 parameters:

  • syntax item name
  • its parameters
  • current state of variables

In this method we have 2, 3 coming as parameters, and 1 created by “createStackLine” function. I store it in ResultStackEntry class and name it: 1-stackLine, 2-invocations and 3-runtimeVariables.

Take a look on github for details.

Putting it all together, this code:

produces stack trace in the output:

The last important thing here is global environment handling. In Jenkins, there is environment map named ENV where keys can be set by assigning values like this: env.TEST_VARIABLE="test1"

Groovy by default doesn’t know anything of such map, so it is neccesary to prepare it so that pipeline can populate it during runtime:

Assertions

We have working framework with result stack but we still cannot do any assertions. We have to add this ability to be able to test anything, don’t we ?

Let’s add some result stack validator (look at the Github for full code):

And assertion class to be mini DSL for asserting result stack when writing tests:

Only now can we start doing some assertions:

E voila. So finally we have very simple working example which can run simple pipeline and assert simple things.

We can use as starting point for real framework with mocking all aspects of pipeline file: sections, steps, functions, objects and properties.
This is all possible with using just 2 aspects of metaprogramming in Groovy: expando metaclasses and binding.

Full code:

https://github.com/grzegorzgrzegorz/pipeline-testing/tree/master

Interesting links:
https://www.groovy-lang.org/metaprogramming.html
http://events17.linuxfoundation.org/sites/events/files/slides/MakeTestingGroovy_PaulKing_Nov2016.pdf

Surefire plugin crash

Surefire maven plugin

I use it to run tests as most of the people. What is maybe not so typical I always run it separately in my pipeline, like so:

So building first, testing second.
Of course in real usage I run much more modules in parallel and many other details are happening not relevant to the topic I want to present. What is important here is the separate surefire invocation. I recently encountered situation of randomly crashing tests.

Understand the background

So the tests after some pipeline tweaking started to crash randomly. My configuration was: Linux machine with 3.0.0-M3 surefire plugin. Many of the job runs were errors (well, pass-fail-error situation showed up instead of nice pass-fail messages…) How to handle such a situation? We need to really understand what is going on before starting fixing things randomly. When surefire starts like in the example above, it creates separate process. So Maven is the first process which in turn starts second separate surefire process.
We have concurrent maven steps starting surefire concurrently with default fork settings: -DforkCount=1 -DreuseForks=true. It means all the test classes will use the same process for the given surefire instance so in above pipeline we are going to have 2 concurrent surefire processes in total lasting for the duration of all tests.

Crash test analysis

When dealing with crashed tests we need to group crashed them in some way. In my case there were 2 significant groups:

  • delayed crashes  – few tests were executed, crash was reported after few or more minutes
  • instant crashes – no tests were executed, immediate crash was reported

Delayed crashes

Delayed crashes indicate most probably there is some problem with RAM memory on build machine or with memory configuration for surefire. In log file one can see surefire invocation like this:

To verify Xmx reason one needs to look at surefire processes in Linux machine:

should be enough to see how much memory single surefire process uses. In my case Xmx setting was ok and all surefire instances were using only inital amount of 512 MBs of RAM.
To verify RAM it is convenient to have some kind of monitoring utility in place although using top/htop could be enough as well. If memory consumption during tests is closing to 100% and in the same time period our test crashes we can be sure this is the reason of delayed crash.
That was the case on my machine as well and it was enough to reduce number of Jenkins executors to allow the machine not to run into out of memory problem even with all of them being busy at the same time.

Instant crashes

This type of crash is much more interesting. In my case after adjusting the environment to use less memory, delayed crashes were gone, but I was still getting instant crashes which was producing much noise in my pipeline.
Again looking closer at the crash log, no meaningful information was in the stacktrace there but in the beginning I could observe information:

So, yes, there are dumpstream files created with much more specific information in workspace:

I could learn from there, surefire couldn’t create file in some location. It just turned out surefire creates temporary files somewhere there. The line with surefire invocation actually indicates which paths are in use.
Still, it is unclear which argument is which – there are 3 of them:

  • /workspace/module/target/surefire
  • surefire5632314446045888006tmp
  • surefire_08947138625012870304tmp

How can we get more details? Well, surefire is open source project available in github:
https://github.com/apache/maven-surefire.git
We can clone it and open it in IDE. In my case I was interested in 3.0.0-M3 version so I checkout out that tag (newer versions exist as a branch and older as tags only). Let’s use code reading skills now!
There is ForkedBooter class holding information how main method arguments are used:

So, arguments are named in the following way:

  • /workspace/module/target/surefire – temp dir
  • surefire5632314446045888006tmp – dump file name
  • surefire_08947138625012870304tmp – properties file

We can see here, potentially there may be permission or path issue. When user does not have privilege to write to target dir or there is some problem with path itself, the test will crash.
It was not the case for me.

Temp filenames look safe for concurrent execution, but temp dir looks like if it might cause problems – there are 2 separate maven and surefire invocations both must be using the same temporary directory!
Indeed, there is tempDir parameter which sets custom temp dir name (one can view all the available parameters in AbstractSurefireMojo.class file!):

Surprisingly on Windows, temp dir is generated in %temp% location with number appended so it is completely parallel safe. So if you happen to have instant crash on this OS, tempDir parameter will not help.

Sum up

Lesson learnt is always more less the same: know the problem, understand the background and apply the knowledge to solve it.

Pipeline design trick

In the previous article I mentioned about attention we need to pay to the resources we have when flattening the tests. Still, the available resources end at some point.

Let’s imagine we have a pipeline with quick/small tests launched after each commit. As new features are added, they get longer and longer. We want to be sure our software is working so apart from unit tests we add integration/medium tests as well as heavy/large part doing end to end stuff.

diagram 1: small, medium and large tests which last 1.5 hour

We try to flatten the tests and start large ones as fast as possible but they are so slow it doesn’t help. We try to run large tests phase every day once in the evening but then we can only learn the next day something is wrong with the software which is not what we want.

diagram 2: flattened tests which last too long

 

diagram 3: separated large tests run in the evening once

What can we do? How can we quickly get feedback about the application under test while having all of the tests executed? Quickly – means something between 20 or 30 minutes and not hours of waiting…

We can find a way to the solution by not analyzing the pipeline anymore but try to look at it from other point of view: the feedback point of view. What is the group of people who needs the feedback then? What will happen if the feedback for some area of the application will be delayed?
Most often there are teams in the company which deal with specific parts of the application: team 1 will do module A and B while team 2 will do C, D and E. Let’s imagine module B is the one involved in large tests. It is tested by many unit tests but then undergoes long phase of end to end tests which last for example 1 hour.
We need to separate large tests phase just as we did when running it once a day, but trigger them synchronously with small tests and making sure it will use the same build artifact as small and medium tests do. The triggering of the large tests phase has to be done only when they are not in progress to prevent the tests queue to grow indefinetely.

diagram 4: large tests as separate pipeline triggered synchronously

Thus, we get feedback as fast as we can for small and medium tests (every 30 minutes) which allows team 2 to keep the pace. Team 1 also benefits as it gets feedback about module A very often. It gets feedback about module B as fast as possible that is about every 75 minutes if testing cycle is active all the time.
If our pipeline is continuous deployment one, we need a trick to allow us to deploy only fully tested code as some of the commits will not be tested by large phase at all. This can be done by tagging the code by each of the test phase (small/medium and large) individually and letting only fully tagged commit id to be deployed.
The only price we pay is that we do not know what is the full test result of every single commit. We get the full feedback about green and yellow tests but red ones are only known sometimes. The runs with uncertain large tests result in diagram 4 are run numbers: 2,3,5,6 while 1 and 4 are fully tested and may be safely released or deployed. I am sure this is very small price for advantages we get.
This idea may be used to divide the pipeline into even more phases if there are more of them which do not fit into 30 minutes basic tests duration. More large tests or maybe some additional medium ones. All of them may be triggered when small tests start under condition they are not in progress and all of them may set individual tags to mark the commits they were testing to be able to always track commits which are safe for release. We have separate feedbacks for all of the phases done as fast as possible.

It is better to have all the tests be very fast and finish after 20 minutes but if this is impossible the above example approach may help to keep both the rich and quick feedback in place.

Few recipes for declarative pipeline

It is often required to iterate over some set of names using sequential or parallel stages.
The good example is when we have list of modules – possibly retrieved automatically – which we want to test, sequentially or in parallel.

1. Sequential example.

Let’s look at sequential example:

First we just build the application without executing any of the tests.

In my opinion tests should be always separated from the build process – even if they are unit tests. The task seems to be easy but it requires some attention.
Firstly, we need to use scripted pipeline approach. Thus it is possible to use for loop. Each of the stage gets generated test name.
Secondly, we need catchError step. If not used, the pipeline would abort on first unsuccessful iteration, while we want all iterations to execute no matter the status is.
Thirdly, after each iteration we need to preserve the surefire output result – testng in this example – so that it can be archived properly.
All of the stages of this pipeline are executed using the same node and the same workspace path.

2. Parallel example.

Making the pipeline parallel I call flattening. Let’s see the example:

 

The key to understand more complex pipelines is to understand where each stage is executed in terms of workspace path and machine. In the example here all the stages which are generated are executed in parallel as parallel keyword is used. It means there will be different workspaces used for each of them (they are executed on the same machine): workspace_path, workspace_path@1, workspace_path@2 etc.
(it is also possible to configure the stages, so that they are executed on different Jenkins nodes).
Thus the pipeline needs to first stash the build artifacts after BUILD stage and then unstash them in generated stage. Stashed files are stored in Jenkins master to allow accessibility to all of the stages no matter their location.
In this example we do not need catchError step as even when any of the stage fails, the rest will still be executed. Testng report file will not be overwritten (all of the reports reside in different workspaces) but it is good practice to rename it so that Jenkins can handle it properly when sending them to Jenkins master for report to be generated. In this example all the reports are gathered in one place and reported in one go while log files are archived after each stage finishes.

3. Summary.

It was just a few simple recipes of the pipelines. There is of course much more: we can split the pipeline not only into stages but also into jobs. BUILD stage could be separate job in the above example. We can do more tasks in parallel than just tests like multiple checkouts, multiple builds. The parallelism can be related to either stages or jobs.
The more things are flattened the more attention we need to draw to the resources we have: number of machines, number of Jenkins nodes on each of them in comparison to available memory and processors, disk read/write speed.
It is required so that the pipeline speed increases with flattening process and not the opposite.

Declarative pipeline

Pipeline

I am working nowadays on a pipeline improvement. This is absolutely crucial part of QA process in the project. Good pipeline is the core around which we can build all the things which constitute high quality.

When starting new project, create pipeline first!

But what does it mean a “good” pipeline? It has to be reliable, informative and it has to be really, really fast. That’s it.

What’s that?

Pipeline is the sequence of actions which should start with the code build and end with the tested build artifacts. There is a feedback about the code: is it OK or not OK? In case of OK message, the artifacts may be just deleted at the end altogether with test environments (continuous integration), may be released (continuous delivery) or – the most interesting option – may be deployed to production (continuous deployment).

There are strategies about the code management: for me the most basic classification is branchless development (some info HERE) vs branching one (well you know this one for sure). In case of the latter (most popular I guess) the successful pipeline allows the code to be merged into main branch, while unsuccessful pipeline prevents the defective code from merging anywhere. In case of the branchless approach immediate actions need to be taken as the code which doesn’t work as expected is already there in your code repository.

The pipeline can be implemented using dedicated software as Bamboo, TeamCity or open source Jenkins.

Jenkins

I will be describing Jenkins in this article as I have most experience with this application. As usual it is a piece of software which has advantages and disadvantages. If I was to name the biggest one item of both:

  • biggest disadvantage – disastrous quality of the plugins’ upgrades: one can expect everything after the upgrade: usability change, feature disappear, old bugs come back (regression) etc.
  • biggest advantage – great implementation of agents: each agent can have many executors which are very easy to configure; the scalability is therefore fantastic and allows massive number of concurrent jobs to be run as each of the job requires one free executor only; commercial competition doesn’t use executors and so 1 agent always means only 1 job at a time: it must be causing much bigger overhead when designing pipeline with many concurrent jobs on 1 machine in my opinion, as many agents are needed while Jenkins can use only 1 per machine.

The practical problems

I mentioned about reliability, information and performance.

  • reliability – pipeline has to work in the same way every time it is launched; no IFs, no unexpected stopping, no deviations in terms of duration of the pipeline as well as amount of feedback information, the general goal is we have to be confident about the pipeline itself
  • information – pipeline has to provide clear information what is working and what is not; clear information means we do not use any more than few seconds to know if the problem is related to module B of our Java application or maybe security problem with “user management” feature
  • performance – pipeline has to be as fast as possible; the larger project the more hard it is to achieve, for me the nice result which I call “fast” is 10 minutes for build and all unit tests and 20 minutes for all the heavier tests (integration, GUI etc); total pipeline runtime of 15-20 minutes is fantastic performance, 20-30 is acceptable but improvements need to be found, >30 is poor and improvements just have to be found; it’s a psychology behind this: let’s do it 3 times per hour or at least twice: if we are waiting 35 minutes the magic disappears

It all sounds simple but it is really hard to achieve. In practice there is often a counterpart of the list above: pipeline gets stuck at random places (application doesn’t install on 1 environment?), it gives red/yellow/green statuses but no one knows what are they tied to (can I release the software or not? is it ok?) and finally it lasts for 2 hours which feels like forever (let’s commit only few times a day otherwise we will not be able to see the result – developers think).

There are many reasons of this:

  • people’s mindset – they discuss every character in the code but forget about the pipeline
  • pipeline software specific problems – in Jenkins there can be 2 kinds of jobs: created in GUI using a mouse clicks and stored inside Jenkins application or written in so called declarative (or older version: scripted) pipeline stored in files in CVS. If the jobs are GUI based they are impossible to maintain for a longer time as their number grows and the relations between them get more complicated. They start living their own life.
  • traditional approach to the sequence of stages in the pipeline (all the stages are run in sequence)
  • doing one universal job which does many things
  • relying on software project management tool only (like Maven), to implement test concurrency and their reporting without using the features of pipeline software
  • physical environments availability
  • physical environments configuration

Fixing it (in details)

I would like to show some details in this section how to try to fix all the problems.

Squashing pattern

Let’s start with the general image of the pipeline. If there are stages/jobs run in the sequence like this:

stages in the sequence

We need to squash it as much as we can, that is:

squashed pipeline

We need separate build stage to be able to use build artifacts for all the testing stages we need to do. We just need to build only once and use the output everywhere we need.

For Maven, it means the transition from notorious:

into 2-step:

1. line is then placed inside Build stage and 2. line is a Unit tests stage.

All other tests start at the same point of time as Unit tests. Of course, heavy tests like GUI require application redeployment. So the stage has to have separate section for redeployment, like this:

 

gui stage

But what is the redeployment step? We have numerous problems when dealing with this step very often. To achieve the stability we have to switch to virtual solution – Docker. Only then can we have the control on this step. If you do not know it yet for some reason read about it, otherwise start using it. It greatly improves the stability of the pipeline:

docker job

But what about the situation we need numerous redeployments for e.g. different applications configiurations? Then we just take advantage of parameters each of the jobs have. In Jenkins, each of the job consists of the stages. Each job can also receive and pass parameters. Again we modify the pipeline by introducing new job:

create container

Now, for each of the required application configurations we have instance of the create_container.job running.

As you can see, we have now stage for each logical step we make in our pipeline.

In general, the pattern is to drill down to the most logical atomic steps to visualize them in pipeline software and squash them as much as possible to gain performance.

Pipeline as code

In Jenkins, as I mentioned above, there is a possibility to click the jobs in GUI. Just do not do that! It is the dead end. The jobs become impossible to maintain very quickly and you are completely stuck. Use declarative pipeline instead.

In Jenkins you use Groovy to create scripts in a simple way like this:

The point is to have pipeline expressed in some language and store it in control version system (CVS). It applies also to any other software used for pipeline: if there is a possiblity to store pipeline as code, do it, otherwise change the software !

 Moving away from project management tool like Maven

Let’s come back to the unit tests step. Unit tests are important stage but often long lasting one. Although we have separate stage for unit tests so far, we still have room for improvement there.

The usual attempt to increase the unit tests speed is:

However I think it is wrong approach as this is moving the concurrency functionality away from pipeline software to Maven. Also, the performance gain is very much questionable. The better approach in my opinion is to ask pipeline software to do the job. It is required just to get the list of modules we need to test. We can do it using Maven:

We need to create a job where 1. stage will be module list collecting stage and 2. stage will generate another stages for each of the individual modules. Here is a full example:

This is a mixture of declarative pipeline and scripted pipeline which allows dynamic stage generation.

In the GET_MODULES stage we extract all the modules from our application and store them as string in env.MODULES variable (this is saved as string and then tokenized to create array). In the INVOKE_CONCURRENT_TESTS stage we generate the stages using runConcurrent method. Each stage will be named using module name.

There are 2 things we need to pay attention to.

The 1. one is workspace. Each generated stage has its own workspace so it cannot see the project which should be tested. To point each stage to the right one, dir directive has to be used represented here with env.BUILD_ARTIFACTS_PATH. I think the best pattern – used here – is to keep workspace with build artifacts available for each job (so that it can be accessed from any agent by dir directive). If this is not possible the stage which does the build needs to archive artifacts to Jenkins master and then each of the generated stages would have to copy them from there. This affects performance of the pipeline very significantly.

The 2. important thing is maven repo set by -Dmaven.repo.local parameter. It is easy to confuse things and run into trouble when default maven repository is used. Especially when building different software versions at the same time or building 2 different applications it is good thing to have dedicated maven repository for each of them.

Anyway, we have now each of the modules tested by separate stage and the duration of the tests is equal to the duration of the longest lasting module:

unit tests job

We can do it this way as we are dealing with unit tests – unit is something which is independent and so our tests must be running properly in this way. The huge performance gain is one point, the other is very clear feedback about the point of failure if one occurs.

Speaking about the clear feedback…

Clear feedback

The goal of this is to get to the point where pipeline provides clear information what is working and what is not. We have it unit tests job already. If module B fails we will know it immediately on the screen as there is separate stage dedicated to each of the modules.

But what about GUI tests? When the job fails, which feature is not working? We need to know it immediately as well, no time to search the logs! The answer is to group the GUI tests in a way that each group represents one of the business functionalities. Let’s imagine application under test has 3 significant features tested: security, processing and reporting. After the tests are grouped (for example in separate packages on the code side), it is possible to create a job like this:

gui tests with features

In this job, all the tests are run concurrently, but when any of the groups fails we exactly know which business area is affected and we can make decisions about the quality of the current build.

Known defects or unknown ?

Very often – I would say too often – there is a situation, some defects exists but due to their priority or some other reasons they are not fixed immediately. They are present for very many builds and mark their presence with red colour. It completely blurs the image about the state of the application: is it ok? Or not? Does the red colour mean we have new defects or maybe these are known ones?

The solution of this problem is to introduce mechanism to mark the known failures in the pipeline software and to set the status accordingly. In Jenkins, it is possible to set status to yellow (unstable) or red (error). We need piece of software which scans through the reports (either Junit or testng report files) and puts the defect number inside the file on one side and set status accordingly on the other: when all the failures are known the status becomes yellow and if there is at least 1 unknown failure the status is red. Because the defect number is put inside the report file we can view it on the report page in Jenkins.

I didn’t find any out of the box solution for this so I created small application myself.

In details, let’s say we have junit report – we need to parse it for example with the help of great library JSoup I described here. We need to have some file containing mapping infromation that is class/method which is already reported and the ticket number. The code parses the junit XML report and substitutes all the known class/method occurences with the same name but with ticket number appended:

Because of this ticket number can be displayed in Jenkins GUI. As the information about number of known failures is known the status turns green if no failures are found, yellow if only reported failures are found and red if there is at least 1 unknown failure in the report.

The final pipeline

After all these ideas are implemented we can see our final pipeline:

final pipeline

It is now squashed. It consists of atomic jobs which can be run as seperate units (for example for testing purposes). It gives fast and clear feedback. It is reliable and significantly improves the quality of the application under test.

Summary

I described here actions I am taking when dealing with the pipeline problem. I am sure this approach is universal: try to make pipeline reliable, clear in feedback and as fast as possible.

JSoup defence for Selenium

Selenium problems

It may happen, Selenium causes problems. There can be at least 2 major things:

  • Selenium test case lasts very long

Selenium request is really expensive. Especially when executing the tests through hub-node (selenium grid) infrastructure. The request is sent from the build machine to selenium hub, then to selenium node, then on selenium node to the browser. The response travels all this way back. When the performance problem shows up during test case execution, the cause is often related to the fact Selenium sends too many requests. We often are not aware how often requests are sent.

  • the page which is to be automated is getting refreshed often

Sometimes we need to assert the page which is automatically refreshed in specific time interval. This problem causes notorious StaleElementReference exceptions. When refresh event happens after Selenium grabs WebElement but before it invokes a method on it the exception surfaces.

 The real life problem

Recently I was dealing with such a problems and was trying to think of a solution.

In my case I was iterating through the table to assert the specific cell in each row.

So the code was more less like this:

 

The performance was very poor and staleness problem was present in almost each run.

Notice, java is sending requests to web browser at all marked lines. Surprisingly, it is the case for every iteration in the loop as well!

When page refreshes during driver chain method, you will get stale element reference exception. The same thing will happen when page gets refreshed anywhere during loop execution. The list of web elements which is used during the test cannot be refreshed until loop is completed!

How to solve such a problem? The solution is either to try to catch the exception so that processing starts at the beginning of the refresh interval or to decrease number of Selenium requests to minimum and move processing to memory as much as possible. The first solution turned out to be impossible as the loop was lasting 3 times longer than page refresh rate…

Here comes the cavalry

JSoup (https://jsoup.org) is the ultimate solution for all such a problems. Not only it is great library extremely easy to use with great documentation and intuitive methods but also it allows to extremely smooth code refactor because of the fantastic feature it supports: CSS selectors.

Just take a look:

The table is extracted using Selenium and then the processing is passed to JSoup completely for the looping time:

  • JSoup creates document of the html table, which is kind of snapshot of the data present at the time document was created which assures data consistency
  • the document is then queried using CSS selectors – completely offline from Selenium point of view and entirely in memory
  • the result is converted back to Selenium WebElement to continue Selenium methods

Now, the web browser interaction is reduced to only 2 places.

The solution is staleness proof and significantly improves execution performance: one just needs to catch StaleElementReference exception when Selenium is in play:

The only thing to consider in this specific example is to decide if we can accept the situation page was refreshed after grabbing the table but before sending getLocation request. Notice, it is perfectly save if there is no page refresh at all.

As for performance, even using local web browser and very small table the difference is noticable (on selenium grid the difference is really huge, believe me!):

Sum up

If there is a problem with multiple Selenium requests which cause performance issue or are making tests unreliable because of StaleElementReference exception – switch to offline processing with JSoup. Just remember, you need to understand the number of Selenium requests in your code, the exact cause of staleness and the impact offline processing brings to your test case consistency.