Categories
Python Code Scrapy Selenium

Combine Scrapy with Selenium

A major disadvantage of Scrapy is that it can not handle dynamic websites (eg. ones that use JavaScript).

If you need to get past a login that is proving impossible to get past, usually if the form data keeps changing, then you can use Selenium to get past the login screen and then pass the response back into Scrapy.

It may sound like a workaround, and it is, but it’s a good way to get logged in so you can get the content much quicker than if you try and use Selenium to do it all.

Selenium is for testing, but sometimes you can combine Selenium and Scrapy to get the job done!

Watch the YouTube video on How to login past javascript & send response back to Scrapy

Combining Scrapy with Selenium
Combining Scrapy with Selenium Solution
Combining Scrapy with Selenium Prerequisites
Combining Scrapy with Selenium - integrating into Scrapy

Below:

The ‘LoginEM’ and ‘LoginPW’ represent the ‘name’ of the input field (find these from viewing the source in your browser).

self.pw and self.em are the variables which equal your stored email and passwords – I’ve stored mine here as environment variables in .bash_profile on the host computer.

Combining Scrapy with Selenium - Lgging in
Combining Scrapy with Selenium - response.replace
Combining Scrapy with Selenium start_urls