Web Scraping Articles

  • regex examples - Sooner or later you will need to resort to using regular expressions if you are scraping text. This article will show you some useful Python (version 3) examples which you can use over and over again.. We’ll be using “re.search” and “group” example 1 – parse a paragraph of text for the number of floors, […]
  • Back Up MySQL - Back up your MySQL database (on a Raspberry Pi) Once in production you will need to ensure you have a copy of your data for many reasons which we can all identify with. Hardware failure Data Corruption Human Error So, before getting too far into a web scraping project using Scrapy with MySQL let’s spare […]
  • Get started with Pandas and MySQL - How to make a dataframe from your SQL database and query the data This assumes you already have a sample database set up on your MySQL server and you have the username and password. In the example shown we are logging on to a Raspberry Pi running MariaDB and we are executing a query to […]
  • Send email with Python - (and attachments) If you have seen any tutorials online, beware of many that use port 465, port 587 is the correct port. Here we show the code you can use to send email and an attachment with this simple Python code. It’s simple, as you don’t need to know MIME or “octet-stream” – you just […]
  • Web Scraping with bs4 - BeautifulSoup Conditonal logic with soup.select Get all dropdown values from html (using page saved locally, for testing) soup.select(‘option[value]’) Note how I use soup.find to narrow down the find, otherwise I would have found other dropdowns as well
  • Price Tracking Amazon - A common task is to track competitors prices and use that information as a guide to the prices you can charge, or if you are buying, you can spot when a product is at a new lowest price. The purpose of this article is to describe how to web scrape Amazon. Using Python, Scrapy, MySQL, […]
  • How To Web Scrape Amazon (successfully) - You may want to scrape Amazon for information about books about web scraping! We shorten what would have been a very very long selector, by using “contains” in our xpath : response.xpath('//*[contains(@class,"sg-col-20-of-24 s-result-item s-asin")]') The most important thing when starting to scrape is to establish what you want in your final output. Here are the […]
  • Combine Scrapy with Selenium - A major disadvantage of Scrapy is that it can not handle dynamic websites (eg. ones that use JavaScript). If you need to get past a login that is proving impossible to get past, usually if the form data keeps changing, then you can use Selenium to get past the login screen and then pass the […]
  • Xpath for hidden values - This article describes how to form a Scrapy xpath selector to pick out the hidden value that you may need to POST along with a username and password when scraping a site with a log in. These hidden values are dynamically created so you must send them with your form data in your POST request. […]
  • Scrapy Form Login - The following is an article which will show you how to use Scrapy to log in to sites that have username and password authentication. The important thing to remember is that there may be additional data that needs to be sent to the login page, data that is in addition to just username and password… […]