JSoup defence for Selenium

Selenium problems

It may happen, Selenium causes problems. There can be at least 2 major things:

Selenium test case lasts very long

Selenium request is really expensive. Especially when executing the tests through hub-node (selenium grid) infrastructure. The request is sent from the build machine to selenium hub, then to selenium node, then on selenium node to the browser. The response travels all this way back. When the performance problem shows up during test case execution, the cause is often related to the fact Selenium sends too many requests. We often are not aware how often requests are sent.

the page which is to be automated is getting refreshed often

Sometimes we need to assert the page which is automatically refreshed in specific time interval. This problem causes notorious StaleElementReference exceptions. When refresh event happens after Selenium grabs WebElement but before it invokes a method on it the exception surfaces.

The real life problem

Recently I was dealing with such a problems and was trying to think of a solution.

In my case I was iterating through the table to assert the specific cell in each row.

So the code was more less like this:

public class main1 {

    private static final Logger logger = Logger.getLogger(main1.class);

    public main1() throws MalformedURLException {
        ChromeOptions capability = new ChromeOptions();
        WebDriver driver = new RemoteWebDriver(new URL("http://127.0.0.1:4444/wd/hub"), capability);
        driver.get("https://www.w3schools.com/html/html_tables.asp");
        logger.info("start");
        List<WebElement> firstCellList = driver.findElement(By.cssSelector("#customers")).findElements(By.cssSelector("tbody > tr > td:nth-child(1)"));
        List<WebElement> resultList = new ArrayList<>();
        logger.info("loop start");
        for (int i = 0; i < firstCellList.size(); i++) {
            logger.info("loop item");
            if (firstCellList.get(i).getText().contains("Island")) {
                resultList.add(firstCellList.get(i));
            }
        }
        logger.info("loop end");
        logger.info("result is: " + resultList.size());
        Point point = resultList.get(0).getLocation();
        logger.info("point is: "+point.toString());
        logger.info("stop");
        driver.close();
    }

    public static void main(String[] args) throws MalformedURLException {
        new main1();
    }

}

public class main1 {

private static final Logger logger = Logger.getLogger(main1.class);

public main1() throws MalformedURLException {

ChromeOptions capability = new ChromeOptions();

WebDriver driver = new RemoteWebDriver(new URL("http://127.0.0.1:4444/wd/hub"), capability);

driver.get("https://www.w3schools.com/html/html_tables.asp");

logger.info("start");

List<WebElement> firstCellList = driver.findElement(By.cssSelector("#customers")).findElements(By.cssSelector("tbody > tr > td:nth-child(1)"));

List<WebElement> resultList = new ArrayList<>();

logger.info("loop start");

for (int i = 0; i < firstCellList.size(); i++) {

logger.info("loop item");

if (firstCellList.get(i).getText().contains("Island")) {

resultList.add(firstCellList.get(i));

}

logger.info("loop end");

logger.info("result is: " + resultList.size());

Point point = resultList.get(0).getLocation();

logger.info("point is: "+point.toString());

logger.info("stop");

driver.close();

}

public static void main(String[] args) throws MalformedURLException {

new main1();

}

The performance was very poor and staleness problem was present in almost each run.

Notice, java is sending requests to web browser at all marked lines. Surprisingly, it is the case for every iteration in the loop as well!

When page refreshes during driver chain method, you will get stale element reference exception. The same thing will happen when page gets refreshed anywhere during loop execution. The list of web elements which is used during the test cannot be refreshed until loop is completed!

How to solve such a problem? The solution is either to try to catch the exception so that processing starts at the beginning of the refresh interval or to decrease number of Selenium requests to minimum and move processing to memory as much as possible. The first solution turned out to be impossible as the loop was lasting 3 times longer than page refresh rate…

Here comes the cavalry

JSoup (https://jsoup.org) is the ultimate solution for all such a problems. Not only it is great library extremely easy to use with great documentation and intuitive methods but also it allows to extremely smooth code refactor because of the fantastic feature it supports: CSS selectors.

Just take a look:

public class main2 {

    private static final Logger logger = Logger.getLogger(main2.class);

    public main2() throws MalformedURLException {
        ChromeOptions capability = new ChromeOptions();
        WebDriver driver = new RemoteWebDriver(new URL("http://127.0.0.1:4444/wd/hub"), capability);
        driver.get("https://www.w3schools.com/html/html_tables.asp");
        logger.info("start");

        String tableAsString = driver.findElement(By.cssSelector("#customers")).getAttribute("outerHTML");
        Document jsoupDoc = Jsoup.parse(tableAsString);
        Elements jsoupElements = jsoupDoc.select("tbody > tr > td:nth-child(1)");
        List<Element> resultList = new ArrayList<>();
        logger.info("loop start");
        for (int i = 0; i < jsoupElements.size(); i++) {
            logger.info("loop item");
            if (jsoupElements.get(i).text().contains("Island")) {
                resultList.add(jsoupElements.get(i));
            }
        }
        logger.info("loop end");
        logger.info("result is: " + resultList.size());
        String cssSelector = resultList.get(0).cssSelector();
        Point point = driver.findElement(By.cssSelector(cssSelector)).getLocation();
        logger.info("point is: "+point.toString());
        logger.info("stop");
        driver.close();
    }

    public static void main(String[] args) throws MalformedURLException {
        new main2();
    }

}

public class main2 {

private static final Logger logger = Logger.getLogger(main2.class);

public main2() throws MalformedURLException {

ChromeOptions capability = new ChromeOptions();

WebDriver driver = new RemoteWebDriver(new URL("http://127.0.0.1:4444/wd/hub"), capability);

driver.get("https://www.w3schools.com/html/html_tables.asp");

logger.info("start");

String tableAsString = driver.findElement(By.cssSelector("#customers")).getAttribute("outerHTML");

Document jsoupDoc = Jsoup.parse(tableAsString);

Elements jsoupElements = jsoupDoc.select("tbody > tr > td:nth-child(1)");

List<Element> resultList = new ArrayList<>();

logger.info("loop start");

for (int i = 0; i < jsoupElements.size(); i++) {

logger.info("loop item");

if (jsoupElements.get(i).text().contains("Island")) {

resultList.add(jsoupElements.get(i));

}

logger.info("loop end");

logger.info("result is: " + resultList.size());

String cssSelector = resultList.get(0).cssSelector();

Point point = driver.findElement(By.cssSelector(cssSelector)).getLocation();

logger.info("point is: "+point.toString());

logger.info("stop");

driver.close();

}

public static void main(String[] args) throws MalformedURLException {

new main2();

}

The table is extracted using Selenium and then the processing is passed to JSoup completely for the looping time:

JSoup creates document of the html table, which is kind of snapshot of the data present at the time document was created which assures data consistency
the document is then queried using CSS selectors – completely offline from Selenium point of view and entirely in memory
the result is converted back to Selenium WebElement to continue Selenium methods

Now, the web browser interaction is reduced to only 2 places.

The solution is staleness proof and significantly improves execution performance: one just needs to catch StaleElementReference exception when Selenium is in play:

public class main2 {

    private static final Logger logger = Logger.getLogger(main2.class);

    public main2() throws MalformedURLException {
        ChromeOptions capability = new ChromeOptions();
        WebDriver driver = new RemoteWebDriver(new URL("http://127.0.0.1:4444/wd/hub"), capability);
        driver.get("https://www.w3schools.com/html/html_tables.asp");
        logger.info("start");
        String tableAsString;
        try {
            tableAsString = driver.findElement(By.cssSelector("#customers")).getAttribute("outerHTML");
        } catch (StaleElementReferenceException e) {
            try{
                tableAsString = driver.findElement(By.cssSelector("#customers")).getAttribute("outerHTML");
            }
            catch(StaleElementReferenceException e2){
                throw new RuntimeException("2. staleness - this was unexpected");
            }
        }
        Document jsoupDoc = Jsoup.parse(tableAsString);
        Elements jsoupElements = jsoupDoc.select("tbody > tr > td:nth-child(1)");
        List<Element> resultList = new ArrayList<>();
        logger.info("loop start");
        for (int i = 0; i < jsoupElements.size(); i++) {
            logger.info("loop item");
            if (jsoupElements.get(i).text().contains("Island")) {
                resultList.add(jsoupElements.get(i));
            }
        }
        logger.info("loop end");
        logger.info("result is: " + resultList.size());
        String cssSelector = resultList.get(0).cssSelector();
        Point point;
        try {
             point = driver.findElement(By.cssSelector(cssSelector)).getLocation();
        }
        catch(StaleElementReferenceException e){
            try{
                point = driver.findElement(By.cssSelector(cssSelector)).getLocation();
            }
            catch(StaleElementReferenceException e2){
                throw new RuntimeException("2. staleness - this was unexpected");
            }
        }
        logger.info("point is: " + point.toString());
        logger.info("stop");
        driver.close();
    }

    public static void main(String[] args) throws MalformedURLException {
        new main2();
    }

}

public class main2 {

private static final Logger logger = Logger.getLogger(main2.class);

public main2() throws MalformedURLException {

ChromeOptions capability = new ChromeOptions();

WebDriver driver = new RemoteWebDriver(new URL("http://127.0.0.1:4444/wd/hub"), capability);

driver.get("https://www.w3schools.com/html/html_tables.asp");

logger.info("start");

String tableAsString;

try {

tableAsString = driver.findElement(By.cssSelector("#customers")).getAttribute("outerHTML");

} catch (StaleElementReferenceException e) {

try{

tableAsString = driver.findElement(By.cssSelector("#customers")).getAttribute("outerHTML");

}

catch(StaleElementReferenceException e2){

throw new RuntimeException("2. staleness - this was unexpected");

}

Document jsoupDoc = Jsoup.parse(tableAsString);

Elements jsoupElements = jsoupDoc.select("tbody > tr > td:nth-child(1)");

List<Element> resultList = new ArrayList<>();

logger.info("loop start");

for (int i = 0; i < jsoupElements.size(); i++) {

logger.info("loop item");

if (jsoupElements.get(i).text().contains("Island")) {

resultList.add(jsoupElements.get(i));

}

logger.info("loop end");

logger.info("result is: " + resultList.size());

String cssSelector = resultList.get(0).cssSelector();

Point point;

try {

point = driver.findElement(By.cssSelector(cssSelector)).getLocation();

}

catch(StaleElementReferenceException e){

try{

point = driver.findElement(By.cssSelector(cssSelector)).getLocation();

}

catch(StaleElementReferenceException e2){

throw new RuntimeException("2. staleness - this was unexpected");

}

logger.info("point is: " + point.toString());

logger.info("stop");

driver.close();

}

public static void main(String[] args) throws MalformedURLException {

new main2();

}

The only thing to consider in this specific example is to decide if we can accept the situation page was refreshed after grabbing the table but before sending getLocation request. Notice, it is perfectly save if there is no page refresh at all.

As for performance, even using local web browser and very small table the difference is noticable (on selenium grid the difference is really huge, believe me!):

22:54:30 (0) start
22:54:30 (153) loop start
22:54:30 (153) loop item
22:54:30 (185) loop item
22:54:30 (230) loop item
22:54:30 (262) loop item
22:54:30 (296) loop item
22:54:30 (327) loop item
22:54:31 (355) loop end
22:54:31 (356) result is: 1
22:54:31 (370) point is: (268, 583)
22:54:31 (370) stop

22:54:30 (0) start

22:54:30 (153) loop start

22:54:30 (153) loop item

22:54:30 (185) loop item

22:54:30 (230) loop item

22:54:30 (262) loop item

22:54:30 (296) loop item

22:54:30 (327) loop item

22:54:31 (355) loop end

22:54:31 (356) result is: 1

22:54:31 (370) point is: (268, 583)

22:54:31 (370) stop

23:04:30 (0) start
23:04:30 (184) loop start
23:04:30 (184) loop item
23:04:30 (185) loop item
23:04:30 (185) loop item
23:04:30 (186) loop item
23:04:30 (186) loop item
23:04:30 (186) loop item
23:04:30 (186) loop end
23:04:30 (186) result is: 1
23:04:30 (239) point is: (268, 583)
23:04:30 (239) stop

23:04:30 (0) start

23:04:30 (184) loop start

23:04:30 (184) loop item

23:04:30 (185) loop item

23:04:30 (186) loop item

23:04:30 (186) loop end

23:04:30 (186) result is: 1

23:04:30 (239) point is: (268, 583)

23:04:30 (239) stop

Sum up

If there is a problem with multiple Selenium requests which cause performance issue or are making tests unreliable because of StaleElementReference exception – switch to offline processing with JSoup. Just remember, you need to understand the number of Selenium requests in your code, the exact cause of staleness and the impact offline processing brings to your test case consistency.

Pass Fail Error

QA applied

Selenium problems

The real life problem

Here comes the cavalry

Sum up

Leave a Reply Cancel reply