The Great 4 Variables and how to use them to tame the heavy tests instability

The Great 4 Variables

In world of testing we just need to care of the 4 Variables. If we can test all the combinations of values they can have we can be 100% sure our software works. Unfortunately, they are really big ones:

  • code
  • configuration
  • data
  • environment

The code is where we focus most often: unit tests, integration tests, system tests. We now the stuff.

The configuration: is less obvious variable. Most often system under test is using default configuration and we miss important aspect of it. We definetely should create and use configuration tests to learn if the configuration is actually working and how it affects the system.

The data is the real nightmare. The infinite number of possible combinations both internal data (application’s database) and external (data coming from outside) can be spoiled, corrupted or just unsupported is turning the risk of the system under test to fail into the sure thing.

The environment is very much underestimated thing: there are really strange OS configurations, versions and patches which can magically render our fantastic application unusable.

This is a real source of all the variety of defects we encounter when assuring the quality!

I do not want to be writing more details about the 4 Variables, I can recommend the great book which discusses the topic in detail (and other topics as well):  Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation

The actual topic I want to cover in this post is the problem of heavy tests instability, and I want to use the Great 4 Variables to solve the problem.

The problem of heavy tests – pass, fail, error

We all know heavy tests. We often refer to them as system or large tests. They need application runtime. They start slowly, work slowly and most often after long period of time instead of saying pass or fail they just say ERROR. Welcome to the world of passfailerror ūüôā

Everything related to the heavy test is slow: it is slow to create it, it takes much time to run it and finally it takes very much time to find the reason of error or failure. If we could only reduce error rate to minimum…

The solution

Let’s imagine we are in the project with many heavy tests which often produce error result. We need to make small steps to decrease the unwanted status. Firstly we need to know the reason of test errors and we need to be able to get this information as quickly as possible when looking at the test result.

Let’s use the 4 Variables for that purpose.

We just need to realize, our testing code is just the same application as any other one. It is also affected by the Great 4 Variables. To execute the heavy test we need the testing code, testing configuration, test data and test environment. We are going to be asserting things in some way and assertions may contain defect. We may be configuring the test in a way it doesn’t work or produces error results. Our test data may be changed by previous test run and thus produce errors or even worse: not credible pass/fail results. Finally our test environment may be malfunctioning or just not working at all (the simplest example is when using docker’s container which is set up at the beginning of the tests unsuccessfully and thus doesn’t work when test starts).

But how can we apply that knowledge? We just need to test all the Variables BEFORE test is started to have confidence it will produce credible result of pass or fail and will not produce any error.

  1. test code – this is going to be real life scenario: I was using Cucumber testing framework as DSL in the project with some nice Selenium back end. The tests were quite stable, the results were clear. What a surprise – for few days they were completely wrong. Because of the defect in Cucumber layer they were reporting PASS no matter what the real result was. The lesson learnt is: always create unit tests for the testing code!
  2. test configuration – when the testing code should enter some special state for the specific test suite we can test if the state is correct. For example, if we have some DSL which has configuration for the application end point is uses (REST, custom client, GUI) it is possible to test if the test code is in GUI mode when GUI test starts.
  3. test data – when there is any chance the test data may be corrupted before testing starts it is neccessary to test if it didn’t change. Again, the real life scenario is I was working with the heavy tests which in certain circumstances when application under test was interrupted abnormally were corrupting the test data. So, if the test data is not safe for some reason it is good idea to check if it is correct before test starts.
  4. test environment – this is very well known issue test environment is down. Either docker container failed to start or maybe the problem is caused by the fact the test environment is used by other teams which planned the machine restart just when our tests start… But I would say we can not only test if it is up, but also do more precise checks like if it has sufficient resources to run the test (memory, processor, disk) or if the test user has some needed permissions for the test to start.

Once we have the checks in place they should be visible as separate items in our pipeline so that we know if our heavy test started at all and if not, what was the reason it failed to start: was it the code, configuration, data or environment failure?

After this we can make improvement actions so that is doesn’t repeat again. We do not loose any time to analyse long log files just to realize after 15 minutes the red status doesn’t mean our application under test has defect in critical area but that the test environment just ran out of space.

Of course, when any of the Variables is not problematic in our pipeline and never cause any problems we can safely skip it. We just need to monitor the Variables which are causing the instability of the heavy tests.


When creating the pipeline to have continuous delivery or at least continuous integration in place we cannot afford to loose time on understanding the results. The message has to be clear for the small, medium and large tests at least for the fact if we actually are having a defect or not. We need to know it instantly.

It is hard to achieve especially for large/heavy tests. In my opinion the best way to solve the problem is to test the Great 4 Variables which are concerning our tests to filter out any failures which are not related to application under test.

Let’s use the saved time for more important tasks.





Improving coverage – automating state transition approach 2.0

Approach 2.0

(This is improved approach in comparison to the idea of state transition testing described HERE.)

I was presenting in my past articles the approach which allows to generate test cases for combinatorial problems. This is very large group of aspects we encouter when dealing with problem of assuring the quality.

Still, there is an area where we need more general approach. Let’s think about simple GUI application which allows you to log in and fill in some form which can be saved. We have combinatorial aspect when filling in and saving it as we can do it in many ways.
But what about the situation we just cancel form filling? Or else we will fill in many form in the row? As you most probably know state transition diagram comes in hand. This is test design technique focusing on most general aspect of the application which is application state and transition context.
I would like to show in this article how to practically model application using this technique and most importantly how to automatically generate test cases with specific coverage which will be instantly executable.

State transition diagram coverage

Speaking about the coverage: according to my idea the coverage for diagrams is basing on how many times each transition is used: I call it N-tn coverage. So when I say 2-tn coverage it means each transition which is present in the diagram will be used at least twice. It is worth to notice this is something different in comparison to what you can find in QA literature where you can find N-switch coverage. As you probably now, 0-switch coverage means you test single transition (no states), 1-switch coverage means you test 2 transitions (the piece of diagram with 2 transitions and 1 state) and so on. This is nice but I think hard to use in pratice. Why? Because to test specific part of diagram you have to render the application into specific state: you have to execute all the states and transitions which lead you to the state you choose as starting point (setting the state of application without executing the path – like updating database, caches and other stuff manually – is very risky in my opinion and should be avoided). It is just better to avoid complex setup process.

The complexity

Unfortunately the complexity which is hidden behind the diagram is enormous: it is actually infinite. Let’s imagine application which has only 2 states and 1 transition:

simplest diagram

simplest diagram

How many test cases can we have? Infinite…
1. A-B-A (1-switch coverage, 1-tn coverage)
2. A-B-A-B (2-switch coverage, 1-tn coverage as T2 is used only once)
3. A-B-A-B-A (3-switch coverage, 2-tn coverage)
4. A-B-A-B-A-B (4-switch coverage, 2-tn coverage as T2 is used only twice)
5. A-B-A-B-A-B-A (5-switch coverage, 3-tn coverage)

and so on until infinity is reached which is never of course…
Repeating transition once, twice and thousand times are all different test cases. Here you can clearly see how many test cases you miss to reach 100% confidence your application is working as expected.
Anyway, theory is very nice but let’s apply it in practice to make it useful finally.

Practical example

In general, we need just the same as what was the case for combinatorial problems: we need a model, generated test cases in xml format and generated test cases in domain language.
Let’s assume we would like to test Notepad’s functionality related to tabs and text direction. Let’s start with plain old diagram:

state transition diagram example

state transition diagram example

It looks very nice but we cannot do anything useful with it right now. Let’s write it in XML format with domain language part:

Now it is becoming unreadable for humans but it is much better for a machine…

Please note, expected results for each transition is “notepad GUI is visible” which is quite trivial. This should be more meaningful when doing real diagram model.
At this point we need some software to find valid paths through the diagram with given N-tn coverage. I couldn’t find anything useful in the internet so I wrote myself the piece of software. You can view the code under automatic-tc-generation-from-diagram-another-approach branch HERE. This is: src/test/java/com/passfailerror/diagram2sequence_generator/ class.

The algorythm is quite simple:
– diagram is converted into state transition table
– starting state row is chosen as 1. item
– state transition table is shuffled and scanned; when matching row is found (according to diagram logic) it is appended to valid path and transition table is reshuffled
– this process repeates until diagram path is built with specific N-tn coverage (each transition is visited at least N times)
– notice, it makes sense to generate more than one diagram case as each time specific N-tn coverage is generated different path is chosen.

After running Diagram2SequenceGenerator there is result XML file which I call diagram cases generated which we need to convert into executable diagram cases as it is not executable yet:

We can do the convertion with enhanced version of testcase generator which was used in my previous articles which were dealing with combinatorial problems. You can see the source in :
src/test/java/com/passfailerror/testcases_generator/ class which receives extra parameter TestcaseSourceType which in turn allows to generate test cases both for TCases output file and diagram cases output file.

The executable test case is:

The sequence of WHENs and THENs is diagram case, while single WHEN-THEN pair would be a test case according to my terminology. Just to repeat: it is valid to have more than one diagram case for given N-tn coverage as the sequence of transitions which is generated is always different.
Now, it is just the matter of running the output as it is directly executable:

Approach 2.0 sum up

First we draw a diagram:



Then we translate it into XML (unfortunately manually):

diagram as XML

diagram as XML

Then diagram cases are generated (automatically):

generated diagram cases

generated diagram cases

Finally executable test cases are generated (automatically):


generated DSL executable test cases

The DSL which is used here (internal domain language implemented in Java) as well as framework (Sikuli) doesn’t really matter. They are used only as an example. Most often it is Selenium, or maybe some kind of strange things like Protractor which will be used in practice and Cucumber or other behaviour driven development library on top of this. The most important thing is that when using approach 2.0 the only important thing is to use any kind of domain language so that it can be used in HAS elements in model file in order to generate diagram cases automatically.


State transition diagram test design technique starts to be useful finally – I have never seen anybody applying this in pratice which is weird as this is about all applications which have at least 2 transitions. Or maybe I didn’t see much?
There are a few important points behind all this: it was very simple problem illustrated here where only few states and few transitions resulted in so many actions. It means the complexity hidden behind simple application is very large and so when modelling more complex applications we have to focus on small coverage or choose only part of application to be tested in this way. Also, I didn’t say anything about invalid paths through the diagram: we should also be checking if invalid paths are really invalid and how system behaves in such situation.

Anyway, I am sure this is very useful technique to deal with problems which are modelled by state transition diagrams.


Automatic test case generation for state transition diagrams (approach 1.0)

Approach 1.0

This article is left here for historical reason. Please read newest version of the idea which is described HERE.

Increase automatic test case generation

I was writing about 2 things in the past: state transition based testing and automatic testcase generation. This is actually about 2 complementary test design techniques: state transition diagrams and decision tables respectively (I do not want to write about details of these techniques now Рthis is a subject for separate post I hope to write in the future). In the latter post I showed how to automate test case generation for decision tables, the goal for today is to show how to start automation when diagram is the starting point.

Combinatorial nature of a problem can be expressed as decision table and can be translated into xml for TCases application to process it and produce output which contains optimal set of test cases (automatic testcase generation). However, the most general way to analyze application under test is the state transition diagram. I already showed how to use this technique in order to achieve the coverage but I showed only the manual approach. Still, we need automatic test case generation!

When diagram is in use, the trouble begins: how to process it automatically? How to generate set of test cases from a diagram? It was quite a while until I came up with some reasonable solution.

I recently thought I could try TCases for this purpose. Although this is meant to identify variables and its values, if transitions of the diagrams could be considered as variables and their dependencies were described in TCases xml input file, I could get valid set of transitions and each transition would be used at least once in basic coverage setting. 

Practical example

Create model

Let’s use the same problem as in¬†state transition based testing. We want to test if Notepad is working when switching between tabs and changing text direction inside each of them as well as writing text in each of them. This is very simplified model but it is enough to ilustrate the concept. The state transition diagram looks like this:


Now, it is required to translate it into XML representation which will be parsable for TCases (I was writing about TCases HERE). This is it:

INPUT is the state name, VAR is the transaction.

COMMAND in HAS elements contains domain language sentences which are executable after simple processing by domain language generator.

WHEN elements describe needed dependencies to allow only valid combinations of transactions.

EXPECTED in HAS elements shows we just assert if Notepad GUI is visible after each set of transactions is run.

There is one problem with this file: in line 16 we need to give all the sequence of transactions needed to reach SELECT2TAB as TAB_1_IS_SELECTED state has 2 outgoing transactions. This shows there is a disadvantage of modeling the diagram in this way if there are states using very many transactions.

Generate executable test case

After generator is run, the set of test cases is produced. Generator reference is

The link to the source code is shown at the end of this post if you are interested.

When using basic coverage which is 1-tuple coverage it will mean each transaction will be used at least once. Because each transaction is marked as TRUE or FALSE (decision about transaction is valid when dependencies are met) the set of transactions will contain both TRUE and FALSE: it means in the generated test case there can be all valid transactions but also part of them as well. This is 1-tuple coverage:

With generated test cases (tc3 is missing as it consists of FALSE values only and generator wisely skips such test cases):

Now, when creation process of test cases is automated it is very easy to increase the coverage. This is 2-tuple coverage:

With generated test cases:

Running the testcases

It is time to run the test cases. The generated test cases are just pasted into JUnit class:

And the class is run as shown here:

If curious, you can view all the code HERE under automatic-tc-generation-from-diagram branch.

Sum up

Even if not perfect this is a solution to automatically generate test cases from state transition diagram. Together with automatic test case generation for combinatorial problems described by decision tables it is very solid approach to quickly achieve optimal coverage and thus assure quality in the application under test.

Project knowledge maintenance

I can see the neverending problem with knowledge in organisations. On one hand people move around, they come and go, on the other hand information is increasing expotentially, giving both new facts and invalidating older ones. The problem is, the main vehicle for information are the people. There are always attempts to make the information independent by storing it in various ways but at least from my experience it is always ineffective: e-mails, various web pages, documents here and there it all makes the access hard and time consuming and eventually the information you get is often out of date and incomplete.

OWL expert system

I was looking for possible solutions and in my opinion very promising one could be a system containing of OWL knowledge base with a reasoner and some user interface which would constitute a expert system to store the knowledge in one place independently of employees and to provide access to everyone eligible at the same time.¬†OWL is a language to represent knowledge about things and its relations. Knowledge base is just a collection of facts where all of them need to be typed in manually. Adding a reasoner however allows to unhide so called “inferred facts” which are usually created by our mind. Adding user interface is self-explanatory.

Let me give you very simple example. Imagine knowledge base which contains facts about 3 Things and its relations. Let’s name it “3 Things knowledge base”.

3 Things Knowledge base

There is a very nice tool for maintaining knowledge base itself: Protege. Clear interface allows to build and maintain knowledge base easily both in Windows and via web interface.

Let’s use it to create ontology for this expert system (I use the term knowledge base and ontology interchangibly).

Firstly, we need to create the minimum amount of facts. To do this we need to create 3 classes:

  • BigThing
  • MediumThing
  • SmallThing


Secondly, it is required to name the relation between classes by adding object properties (and their hierarchy):

  • contains
  • containsDirectly
  • contains is a transitive parent of containsDirectly

The implementation of the containers’ relationships is not straightforward. It should be split into transitive object property “contains” and its subproperty “containsDirectly” as on the Protege screenshot:


The important thing is the transitivity of “contains” (which means that if A contains B and B contains C then A contains C).

More on the types of object properties can be read for example HERE

Thirdly, we need to use the object properties to store the information about the class relations and add class instances (individuals) at the same time:

  • BigThing containsDirectly MediumThing
  • instance is bigbox


  • MediumThing containsDirectly SmallThing
  • instance is mediumbox


  • SmallThing instance is smallbox


Notice, we do not need to put any facts related to SmallThing. You can see the text representation of the knowledge base HERE


It is required to apply some reasoner to the ontology. This is to check consistency as well as retrieve inferred facts. There are many more features and details but this is out of scope of this article…

I use HermiT reasoner in this example (HermiT.jar is required to run the program).

We need it for example to have inferred fact that BigThing contains SmallThing – there is no fact like this in knowledge base!


Also, OWL-API is required for reasoner to interact with the ontology. OWL-API.jar is needed in this example. When writing OWL-API and reasoner code I used very much of THIS example (I am using version 3.1.0).


Now, we need to use DL query to get the information we are interested in. DL query is the syntax you use to get information from ontology. As the result of the query you may get superclasses, classes and subclasses as well as individuals. We are interested in subclasses of the response in this example as well as individuals. The query looks like this:

contains some SmallThing

As the response we are going to get information which classes (which are going to be subclasses of this query) and which individuals of the ontology meet this condition.

As the result of our work we have such a expert system at this point:


Which works like this:

You can get the full code of the application from HERE.


It may be looking simple to do but unfortunately it is not easy to create the proper knowledge base even for simple case like here.

There are good practices you have to know before you can start creating knowledge base. For example to implement contener’s hierarchy like in this example one needs to read THIS.

Other than that, the reasoners differ from each other and they support various features so it is possible given reasoner will not be able to operate on the ontology as in this example.

Last but not least DL query is not intuitive way of asking questions in my opinion and I think it would be problematic to create a good translator of English sentences into DL queries.


Application of ontologies in QA world

The problem of knowledge base is very wide and interesting. When thinking of QA area I am thinking about the OWL knowledge base which contains information about application under test. This could be storing information about every aspect of the project: all its abstraction levels – starting from business information (usage, typical load etc.) down till classes and the design (detailed descriptions). This could be available both to project team members (developers to catch up with the code quickly, testers to understand how to use it etc.) and other applications which could benefit from such knowledge for example automatic testing tools (automatic exploration tests, automatic non functional tests etc.).

It was just the touch of the project knowledge maintenance problem which for sure is very complex to solve but it is very universal in my opinion as well and that’s why it is really worth to keep on experimenting and trying to find the solution.

I really mean it was just a touch – just take a look at THIS.

Short about mind maps

Human nature

I think mind maps are still something which is not used to the extent it could be. The idea is very powerful: to use our brain more efficiently. Humans in general memorize visual things, maybe also meaningful sentences, sometimes melodies or body movement sequence (dance) but definitely not numbers nor random strings. There is funny situation in my opinion when passwords are concerned: most of the systems require so called strong password. It is a contemporary myth though: the stronger password they require the higher the probability user will not memorize it. In such case, user will write it down or will try to use the same one for many systems. Does it increase the security? No, it works the opposite way…

Anyway, I like the idea of mind maps as they are truly designed for humans which is rare in today’s systems.


Mind map applications

Mind maps are visual representation of information. They appear as colourful nodes connected with colourful lines.

People in general use it for things like:
– brain storming,
– making fast notes,
– learning things (I tried this way and for me it really works)

I do not want to write about these points – they are more obvious and there is information in internet covering these topics. I would like to show you how to use it in other ways which are less obvious.

There are few applications available (I know Freeplane and Freemind) which implement mind map. I personally prefer Freeplane and all the examples here are done by that application.

Knowledge base

When joining the project very often there is a situation of disspered project know-how. Actually, I have never met a situation of solid knowledge base not to mention expert system on top of it. Information1 is in email, Information2 is on that web page and Information3 is known to that guy over there only. This is the reality in which we often have to start to work.
It is handy to start using mind map as a personal knowledge base, like this one for example:

knowledge base example

knowledge base example

Every node in Freeplane can be marked with colours and shapes and most importantly can have a link to any resource located both locally and remotely.

OS extra layer

We can go further and use it as an extra layer pu on the top of OS. The specialized layer which concerns our project domain only, perfectly customized… let’s call it Project Map.
Let’s move our knowledge base aside – it will be one of the child nodes in our Project Map.
It will contain things like:
– knowledge base which we continuously expand,
– links to things like scripts, reports, files,
– links to external data

There is no point in losing time on clicking through Window menus, recalling locations of various things. This is distracting and it slows down the work. Everything is now 1 click away.

Take a look at the screenshot:


OS overlay example

Again, every node can be clicked and will trigger an action which can be anything from navigating to other mind map to launching an application.

Nice feature Freeplane has is to export to Java applet. Please take a look at this link:

— project mind map —

Thus you can also share the mind map with the team or the world just as I did.


There are many examples in the web. I am sure you can get inspired if this was not the case after reading this post.


Reactive and proactive approach

Theory – friendly requirements

Speaking about quality assurance, there are well known patterns available of how to approach a project. We start with requirements (functional and non-functional), we think how to describe them, we consider behaviour driven development, specification by example and other techniques to achieve clear communication with customer. We create test cases using test design techniques to achieve the right coverage. We have some schedule to fit in, we can plan our actions.

requirements approach

requirements approach – proactive

This is all very useful for the project start (new feature start), this is very nice to have all these in the project. I call it proactive approach – I can act before any defect is planted to the code. I can set my defense even before anything was started. I have time.

Reality – angry user

How many times were you assigned to the project where things are set up in the right way? I think in most cases we are thrown in the middle of the sophisticated project which has many problems and where end users or client is complaining about many things. And they want you to act fast. How to act effectively in such a difficult environment?

I say – act reactively first. Group all the defects reported in the production and think of the most efficient actions so that when a defect is solved, not only it is retested, but also whole area of the bug is secured from quality point of view in the project as well. Such an approach improves project in places which are most important for the end user. When it is done, the user calms down and you can move balance to proactive approach, which will improve the project in the long term.

field feedback

field feedback – reactive


Let me give you some examples. They are real one.


User is complaining about not seeing data in the application in some places. It turns out, there is a problem with database connection as nice db guys forgot to tell your team they changed the connection string over the weekend. Well, it seems like it is not a defect for us, is it ? Other user is complaning of not being able to send anything from the application to some other one. It turns out, there is hardware malfunction of the remote application, again it doesn’t seem to be our problem. Finally the third one cannot save the data to the database – this time it seems we have typo in connection string in the production so let’s fix it.

At this point it is important to notice these are all configuration related issues and we can improve this area in the project by introducing configuration testing. We must be missing it by now. This can be easily added as component (unit/small) testing. All the configurations we have in the code base should be tested during each build to catch configuration strings’ typos as well as to be aware of the status of external systems to know in turn if the remote system is having problem or it was reconfigured for some reason. On top of it we can build configuration tool which would read production configuration and do the quick configuration check if in doubt if the production issues is related to connection problems or not.


Few users are complaining the application works very slowly. All the data is loading slowly and it causes business problem for them. It turns out, there is a defect in which timeout configuration setting – although set in configuration file – is not applied in the runtime which degrades the performance significantly.

This problem may be solved by introducing simple automatic comparison testing, where in the test environment a comparison of various configuration settings with runtime settings as displayed in the log files is made.


There are few defects with major priority reported, which are related to the core functionality of the application.

Well, it is easy one – actually the basics. The goal is to create the pyramid like set of test cases (many component tests, less integration tests and few system tests) which will cover this functionality as fast as possible to prevent defects from plaguing this area which is most important for end user.


Users again are complaining about performance, the description of the issue they give is very general and doesn’t help to narrow down the area where defect is hidden.

Log analysis is needed to find out how application characteristic in production looks like. It may occur, logging is not properly implemented and much information is missing or the other way around logging gives too much information. In such case developer team should start improving this to make log analysis easier in the future.


Developers are spending much time supporting the application as support team is asking them for help twice a day.

This is bad situation which slows down the project very much. Developers get frustrated as they cannot concentrate on 1 thing during work day. Project work is slowed down as instead of improving things or implementing new features devs are talking for hours with support or end users.

The best reaction in my opinion is to push as much activities to support as possible and also start working on things which will allow more things to be pushed away soon. Well, easy to say, harder to do… but possible. Again, we need to group problems which are reported by support:

– what does this button do? how to do this or that in application? – this is application knowledge question and should not be asked at all to developers; knowledge base should be started or even better expert system should be built to allow support to get the needed answers on their own

– the app is not working, there is no data in that window – if it turns out it is external system which doesn’t work, let support guys use configuration check tool (create it immediately if it is not available); they should be able to tell by themselves if there is problem within the app or else it is just external connection problem

– the app is not working, some issue appears – support guys should be able to understand what is the sequence of events which preceeded the issue; if they are not able to say anything maybe they need more clear logging, some memory usage history or more information in log files to be able to understand the state of the system quickly and also to immediately deliver the needed information to fix the defect for developer.

 The proactive reality

I mentioned in the last section in my opinion we should use proactive approach anyway. However, it isn’t hard to think of a project which has problems with requirements. We can run into trouble here as well. What can we do not to rely on production feedback only and start pin pointing the defects before they reach end user? If we do not want to base on field feedback and requirements are not there or maybe are not complete at the same time, a very good method of improving proactive approach is comparison testing. I show it on the diagram – we have 2 red arrows showing missing or incomplete communication for requirements and feedback:


comparisons – improving proactive

Testing by comparison is a powerful technique.

We compare previous and present version of the system or single component. Previous version can be yesterday build or last one which was released to production depending on our needs.

What can we be comparing actually?

–¬† performance – this would be so called benchmarking to see if newest version is faster or slower

–¬† functional behaviour – any discrepancies indicate an issue which in turn might mean a defect

–¬† logs – differences (more logging, less logging, more warnings, errors etc.) may indicate a defect

– memory consumption – differences between consecutive test runs may indicate a memory leak defect (notice that we can observe it in comparison testing, even if no out of memory exception is thrown)


In my opinion it is good if we can act proactively but if the application causes problems for end users we should start with field defects analysis – reactive approach. We need to use production feedback in a smart way to get the large effect from small piece of information and small effort as well. After this, we should move to proactive part which often may be not so simple to handle in reality, but comparison testing improves this approach in significant way.

We are proactive unless production issues are reported again which make us react in a smart way again. In time, the application becomes more stable and reliable and only then can we finally do exactly what standard QA theory says.


ELK == elasticsearch, kibana, logstash

Log analysis

Often we have a need to analyze log to get some information about the application. ELK is the one of the possible solutions for this problem. The basic source of information is HERE.

Here is the picture showing how it may look like:


ELK architecture example


Filebeat – collects log data locally and sends it to logstash

Logstash – parses logs and loads them to elasticsearch

Elasticsearch – no-sql database – stores the data in the structure of “indexes”, “document types” and “types”; mappings may exist to alter the way given types are stored

Kibana – visualize the data

Desktop – user may use web browser to visualize data via Kibana or create a code to interact with elasticsearch via REST interface.

Besides installing and configuring all the stuff one needs to remember also about deleting old data in elasticsearch not to exceed disk space – logs can take huge amount of space as you know. I achieved that by creating some crontab started simple Perl script which parses index names and deletes the ones which are older than given amount of days. Automatic maintenance is a must for sure.

There is much in the web about Elk so I will not be giving any more details here. I would like to concentrate only on one thing which was hardest and took me longest time to achieve – logstash configuration.


The documentation there on the page seems to be rich and complete, however for some reason it was very unhandy for me. Ok, I say more, the authors did the trick and created a page which was almost useless for me. I do not know exactly the reason but maybe it is because of too few examples, or maybe they assumed the knowledge of average reader is much higher than mine. Anyway, I needed really much time to achieve the right configuration for all items and especially for logstash.

Let’s get to the details now. Look at this configuration file:


  • ¬†1-5 lines are self explanatory – logstash receives input from filebeat
  • ¬†7+ – start of filtering data basing on the type which is defined there in the filebeat config (surefire will come from other machine, so no surefire document type here):

  • ¬†12 – parsing of the line (seen under message variable by logstash) of which the example is: “01-01-16 13:05:26,238 INFO some log information”

now the mapping is created:

  • date = 01-01-16
  • time = 13:05:26,238
  • loglevel = INFO
  • detailed_action – “some log information”


  • 17 – parsing “message” again to get datetime = 01-01-16 13:05:26,238
  • 21-25 – date filter to explicitely set timezone to be UTC – this is important to have all the time date marked with proper timezone so that it can give proper results when analyzed afterwards
  • 29-48 – it is the same for testlogfile type with one difference:
  • 46 – this is how logstash can conditionally add field; when at least 2 words are encountered in detailed_action field, logstash will create duplicate of detailed_action named detailed_action_notAnalyzed (this is required when creating a mapping in elasticsearch which in turn allows to look for group of words – see the end of the post)
  • 50-86 – surefire type which is interesting because it is xml data
  • 52-60 – does 2 things: firstly is cleaning the input line from non-printable characters and extra white space and secondly it adds datetime field (logstash internal %{[@timestamp]} field is used); we don’t have datetime here as the opposite of the regular log data so we have to add it in logstash
  • 61-78 – xml filter which maps xpath expression to field name, for example <testsuite “name”=”suite_name”> will turn into: test_suite = suite_name
  • 79-84 – works around surefire plugin problem, that no status is shown in xml file when test cases passes, do you maybe know why somebody invented it this way? this is really frustrating and also stupid in my opinion…
  • 87-96 – architecture type is checked here to determine filename (without full path), it comes from this place in filebeat config:

  • ¬†101-104 – very important feature is here: the checksum basing on 3 fields is generated after which it is assigned to metadata field “my_checksum”; this will be used for generating document_id when shipping to elasticsearch which in turn allow to prevent duplicates in elasticsearch (imagine that you need to reload the data from the same server next day from rolling log files, you would store many duplicates but having checksum in place will allow only new entires to show up in database)
  • 110-141 – output section which has type based conditions
  • 116-118 – logfile document type will be sent to elasticsearch to index “applicationName-2016.05.26”, “logfile” document_type will be created with generated checksum document_id (to prevent duplicates)
  • 112,122,132 – commented lines when uncommented serve for debug purpose: output is sent to elasticsearch and console at the same time

Other usage scenarios

After investing much effort in the solution I am also experimenting with other usage – not only for log data – to get more value in the project. After regression test suite is run I send surefire reports and test execution logs (this is domain language level) and view them via dashboard which also collects all application log data (server and client) at the same time. This gives consistent view on test run.

The interesting ability is also REST interface to elasticsearch. I use it for programmatically downloading the data, processing it and uploading the result back there. This is for the purpose of performance analysis where information in the log files as logged by the application requires processing before conclusions can be made.

This ability allows for example of creating very complex automatic integration or even system integration tests, where each component would know the exact state of the other by reading its logs from elasticsearch. Of course we should avoid complex and heavy tests (how long does it take to find a problem when system integration test fails…? ) but if you have a need of creating few of them at least it is possible from technical point of view.


In general…

… it is a nice tool. I would like just to name ELK drawbacks here – it is easier to find advantages in the web I think:

  • it is hard to delete the data from elasticsearch so most often one needs to rely on index names
  • logstash with complex configuration can start even 1 minute – any logstash configuration tests require separate logstash instance then
  • it is hard to debug logstash
  • there is no good way of visualizing the data in Kibana when you are not interested in aggregates; if there is some event logged, you can display information like how many times per day/hour/minute it occurs but you cannot do it like you would in gnuplot for example
  • the search engine is illogic: to be able to find string like “long data log” one needs to have this fields stored as “string not analyzed field” (the default behaviour for strings is “string analyzed” when you can only search for single words); there is a trick to do the appropriate mapping in elasticsearch and store string as “analyzed” and “not analyzed” at the same time (if it is let’s say log_message “analyzed” string type, log_message.raw “not analyzed” variant is created at the same time) but kibana cannot work with *.raw fields; the mapping I am talking about looks like this:

So, you need to split the log_message in logstash to create two separate fields (look at line 46 of logstash config discussed above) e.g. log_message and log_message_notAnalyzed. Otherwise to search for “long log data” string in kibana you have to write this thing:

Which searches also for things you do not want to find: “log data long”, “log stuff stuff long stuff data”, “stuff long log stuff stuff data” etc. This is really a problem given the need of finding few word strings is very common thing.

That’s it for disadvantages. I think ELK does the job of log analysis very well anyway. For sure it is worth to try.

Get the right coverage

Test design techniques

Let’s think about quality control basics for a moment. In my opinion this is the most important thing to be able to design the right test case. No matter if this is automated or manual we need to get the confidence software we are working on has a very low probability of still unrevealed functional defects in the area covered by our test cases. We do not have to reinvent the wheel as we have already test design techniques there in place to help us achieve this goal. Because it is basics of the basics for QA engineer you know all of them and apply them in practice aren’t you… ? I can tell you basing on my interview experience in reality the majority of QA engineers heard about some of them but also majority is not applying them in practice (as described here). If you are by any chance in this notorious majority, I hope you read all this no to be part of this group of people anymore.

Pair wise testing

I would like to concentrate in this article on the most advanced test design technique in my opinion, or at least most interesting from my point of view which is pair-wise testing. The purpose of this technique is to deal with situation when number of combinations we have to test is too large. Because we have combinations almost everywhere this is extremly important thing to know. The of combinatorial testing problems are for example:

  • application setting page with many switches (we need to know if some combination of settings we choose doesn’t influence any of them – Notepad++ preferences),
  • software designed for many platforms (the combination is here array of operating systems – UNIX, mobile, Windows – and external software combinations – database vendors, database providers),
  • aplication which has REST or SOAP web service interface (number of available combination of input data – application accepts POST message in XML format, some of the elements are mandatory, some of them are optional)

The idea behind pair-wise technique is to focus on pairs instead of all combinations.
For example, let’s imagine we have 3 inputs where each of them accepts one letter at a time. 1. input accepts only letters (A,B), 2. (A,B) and 3. (A,B,C). We can easily write all combinations for such model (2x2x3=12 combinations):

1 => (A,B)
2 => (A,B)
3 => (A,B,C)

full coverage – all combinations
no 1 2 3
1 A A A
2 A A B
3 A A C
4 A B A
5 A B B
6 A B C
7 B A A
8 B A B
9 B A C
10 B B A
11 B B B
12 B B C

Of course, in such a case we do not need any special approach, we can test all of them. But let’s think of a situation each combination takes 1 week to execute or else that we have 3 inputs where range A-Z is accepted or when each input accepts more then one letter.
We can decrease the coverage for 100% (all combinations) to all pairs. Please notice 100% coverage here means actually all triplets. We are actually moving from all triplets to all pairs now:

Let’s enumerate all pair combinations as we are interested now only in pairs:

pairs listing
no 1 2 3
1 A A
2 A B
3 B A
4 B B
5 A A
6 A B
7 A C
8 B A
9 B B
10 B C
11 A A
12 A B
13 A C
14 B A
15 B B
16 B C

Let’s choose the subset of combinations which will have all the pairs listed above. Consider this:

reducing number of combinations
no Comb. 1 2 3 comment
1 AAA AA_ _AA A_A we don’t need this, we have these pairs in AAC, BAA and ABA
2 AAB AA_ _AB A_B we don’t need this, we have these pairs in AAC, BAB and ABB
9 BAC BA_ _AC B_C we don’t need this, we have these pairs in BAB, AAC and BBC
10 BBA BB_ _BA B_A we don’t need this, we have these pairs in BBC, ABA and BAA
11 BBB BB_ _BB B_B we don’t need this, we have these pairs in BBC, ABB and BAB
12 BBC BB_ _BC B_C

So, we can use now just 7 combinations out of 12 originally:

all pairs coverage
no Comb. 1 2 3 comment

Can we reduce number of combinations more? Yes, we can move from all pairs coverage to single value which would mean that we want to use every possible value for each input at least once and we do not care about any combinations at the same time:

single value coverage
no 1 2 3
1 A A A
2 B B B
3 A B C

In this set of 3 combinations input 1 uses A and B, 2 uses A and B and 3 uses A, B and C which is all that we need.
As you can see we have a nice theory, but we are not going to compute things manually, are we ?


Let’s use the software for the example from the previous section.
It is named TCases and it is located HERE. I will not be explaining the usage as there is excellent help on the page there (I tell you it is really excellent). It is enough to just say we need input file which is modelling the input and generator file which allows us to set actual coverage. The input file for the example shown above looks like this: