Python Code Raspberry Pi Scrapy

Configure a Raspberry Pi for web scraping

Run Scrapy Spider

If you don’t already have a way to transfer the spider onto your Raspberry Pi you can simply use

git clone

Once all this is in place, you can run the spider with “nohup”

sudo nohup privatevpn

sudo python3

Note: You also need to run the VPN with “nohup” as well, otherwise it will lose the connection when you exit from PuTTy / your Terminal session.

htop, webscraping, pi
From CLI type “htop” – and don’t worry if your spider sometimes maxes out the CPU. See “python3” on the screenshot above.

Check output

From your remote PC, eg a Windows PC, Mac, or Ubuntu, you can then copy the output file from the Pi, and open it, as you should have much more memory and of course a GUI!

To copy the results.csv from the Pi to my Ubuntu PC I used “rsync” but you could also use SCP.

Using rsync to transfer output to another host, to read the file

Raspberry Pi and Scrapy output - webscraping
The CSV produced on the Raspberry Pi by Scrapy and my spider.