October 2020

October 17, 2020 / Last updated : October 17, 2020 admin Python Code

How To Web Scrape Amazon (successfully)

You may want to scrape Amazon for information about books about web scraping! We shorten what would have been a very very long selector, by using “contains” in our xpath : response.xpath(‘//*[contains(@class,”sg-col-20-of-24 s-result-item s-asin”)]’) The most important thing when starting to scrape is to establish what you want in your final output. Here are the […]

October 7, 2020 / Last updated : October 7, 2020 admin Python Code

Combine Scrapy with Selenium

A major disadvantage of Scrapy is that it can not handle dynamic websites (eg. ones that use JavaScript). If you need to get past a login that is proving impossible to get past, usually if the form data keeps changing, then you can use Selenium to get past the login screen and then pass the […]

October 5, 2020 / Last updated : October 5, 2020 admin Python Code

Xpath for hidden values

This article describes how to form a Scrapy xpath selector to pick out the hidden value that you may need to POST along with a username and password when scraping a site with a log in. These hidden values are dynamically created so you must send them with your form data in your POST request. […]

October 4, 2020 / Last updated : October 4, 2020 admin Python Code

Scrapy Form Login

The following is an article which will show you how to use Scrapy to log in to sites that have username and password authentication. The important thing to remember is that there may be additional data that needs to be sent to the login page, data that is in addition to just username and password… […]