Categories
Python Code Raspberry Pi Scrapy

Configure a Raspberry Pi for web scraping

Run Scrapy Spider

If you don’t already have a way to transfer the spider onto your Raspberry Pi you can simply use

git clone https://github.com/RGGH/scrapy6.git

Once all this is in place, you can run the spider with “nohup”

sudo nohup privatevpn

sudo python3 foodcom.py

Note: You also need to run the VPN with “nohup” as well, otherwise it will lose the connection when you exit from PuTTy / your Terminal session.

htop, webscraping, pi
From CLI type “htop” – and don’t worry if your spider sometimes maxes out the CPU. See “python3 food.com” on the screenshot above.

Check output

From your remote PC, eg a Windows PC, Mac, or Ubuntu, you can then copy the output file from the Pi, and open it, as you should have much more memory and of course a GUI!

To copy the results.csv from the Pi to my Ubuntu PC I used “rsync” but you could also use SCP.

Using rsync to transfer output to another host, to read the file

Raspberry Pi and Scrapy output - webscraping
The CSV produced on the Raspberry Pi by Scrapy and my spider.

Leave a Reply

Your email address will not be published. Required fields are marked *