JSoup defence for Selenium

Selenium problems

It may happen, Selenium causes problems. There can be at least 2 major things:

  • Selenium test case lasts very long

Selenium request is really expensive. Especially when executing the tests through hub-node (selenium grid) infrastructure. The request is sent from the build machine to selenium hub, then to selenium node, then on selenium node to the browser. The response travels all this way back. When the performance problem shows up during test case execution, the cause is often related to the fact Selenium sends too many requests. We often are not aware how often requests are sent.

  • the page which is to be automated is getting refreshed often

Sometimes we need to assert the page which is automatically refreshed in specific time interval. This problem causes notorious StaleElementReference exceptions. When refresh event happens after Selenium grabs WebElement but before it invokes a method on it the exception surfaces.

 The real life problem

Recently I was dealing with such a problems and was trying to think of a solution.

In my case I was iterating through the table to assert the specific cell in each row.

So the code was more less like this:


The performance was very poor and staleness problem was present in almost each run.

Notice, java is sending requests to web browser at all marked lines. Surprisingly, it is the case for every iteration in the loop as well!

When page refreshes during driver chain method, you will get stale element reference exception. The same thing will happen when page gets refreshed anywhere during loop execution. The list of web elements which is used during the test cannot be refreshed until loop is completed!

How to solve such a problem? The solution is either to try to catch the exception so that processing starts at the beginning of the refresh interval or to decrease number of Selenium requests to minimum and move processing to┬ámemory as much as possible. The first solution turned out to be impossible as the loop was lasting 3 times longer than page refresh rate…

Here comes the cavalry

JSoup (https://jsoup.org) is the ultimate solution for all such a problems. Not only it is great library extremely easy to use with great documentation and intuitive methods but also it allows to extremely smooth code refactor because of the fantastic feature it supports: CSS selectors.

Just take a look:

The table is extracted using Selenium and then the processing is passed to JSoup completely for the looping time:

  • JSoup creates document of the html table, which is kind of snapshot of the data present at the time document was created which assures data consistency
  • the document is then queried using CSS selectors – completely offline from Selenium point of view and entirely in memory
  • the result is converted back to Selenium WebElement to continue Selenium methods

Now, the web browser interaction is reduced to only 2 places.

The solution is staleness proof and significantly improves execution performance: one just needs to catch StaleElementReference exception when Selenium is in play:

The only thing to consider in this specific example is to decide if we can accept the situation page was refreshed after grabbing the table but before sending getLocation request. Notice, it is perfectly save if there is no page refresh at all.

As for performance, even using local web browser and very small table the difference is noticable (on selenium grid the difference is really huge, believe me!):

Sum up

If there is a problem with multiple Selenium requests which cause performance issue or are making tests unreliable because of StaleElementReference exception – switch to offline processing with JSoup. Just remember, you need to understand the number of Selenium requests in your code, the exact cause of staleness and the impact offline processing brings to your test case consistency.


Leave a Reply

Your email address will not be published. Required fields are marked *