Handling exceptions and browser crashes in Selenium

This is part 3 of my Python, Selenium, Fargate posts.

Part 1 — Run a Python Selenium web scraper on AWS Fargate
Part 2 — Adding Browsermob Proxy to sniff traffic and have more confidence in whether the website you’re trying to scrape has loaded
Part 3 — exception handling strategies for when something inevitably crashes (this)

The scrape job that powers Torres App takes about 4 hours to run. Early on, it would often fall victim to random Selenium exceptions such as:

NoSuchElementException
TimeoutException
WebDriverException

Browsers crash and the internet sometimes doesn’t work, so I needed to make my scraper resilient to this, here’s what I did building on the previous post in the series.

The code for this example is here.

selenium_exceptions/main.py

import contextlib
from browsermobproxy import Server
from selenium import webdriver
from selenium.webdriver import DesiredCapabilities
from selenium.common.exceptions import NoSuchElementException, TimeoutException, WebDriverException
import argparse
import traceback
import time

from selenium_exceptions import settings

config = settings.Config

Updated the include list

def retry(func, *args):
    retries = 10
    while retries > 0:
        try:
            return func(*args)
        except (NoSuchElementException, TimeoutException, WebDriverException) as e:
            if retries > 0:
                retries -= 1
                print("Retries left {}, Continuing on {}".format(retries, traceback.format_exc()))
                time.sleep(5)
            else:
                raise e

I added a function that caught the exceptions and retried the scraping script.

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('config', type=str, nargs='?', help='the config class')
    args = parser.parse_args()
    config = getattr(settings, args.config)
    retry(demo)

Finally, changed the main call to use the retry function.

Obviously, this example is trivial but this pattern can be used along with an object to maintain some state and have the browser restart part way through the scrape job after a crash.

Handling exceptions and browser crashes in Selenium

Author

Comments

Write a Reply or Comment Cancel reply