July 2020

July 27, 2020 / Last updated : February 15, 2023 admin Python Code

Scraping a page via CSS style data

The challenge was to scrape a site where the class names for each element were out of order / randomised. So the only way to get the data in the correct sequence was to sort through the CSS styles by left, top, and match to the class names in the divs… The names were meaningless […]

July 13, 2020 / Last updated : February 23, 2023 admin Python Code

Configure a Raspberry Pi for web scraping

Introduction The task was to scrape over 50,000 records from a website and be gentle on the site being scraped. A Raspberry Pi Zero was chosen to do this as speed was not a significant issue, and in fact, being slower makes it ideal for web scraping when you want to be kind to the […]

July 10, 2020 / Last updated : July 10, 2020 admin Python Code

Scraping “LOAD MORE”

Do you need to scrape a page that is dynamically loading content as “infinite scroll” ? Using self.nxp +=1 the value passed to “pn=” in the URL gets incremented “pn=” is the query – in your spider it may be different, you can always use urllib.parse to split up the URL into it’s parts. Test […]

July 2, 2020 / Last updated : July 2, 2020 admin Python Code

Extracting JSON from JavaScript in a web page

Why would you want to do that? Well, if you are web scraping using Python, and Scrapy for instance, you may need to extract reviews, or comments that are loaded from JavaScript. This would mean you could not use your css or xpath selectors like you can with regular html. Parse Instead, in your browser, […]