Categories
Python Code

Read Scrapy ‘start_urls’ from csv file

How can the start_urls for scrapy be imported from csv?

Using a list comprehension and a csv file you can make Scrapy get specific URLs from a predefined list

use the .strip() method to remove newline characters

Here you can see the line.strip() is performing the removal:

[line.strip() for line in file]

Demonstration of how to read a list of URLs from a CSV (and use in Scrapy)

with open('data.csv') as file:
    start_urls = [line.strip() for line in file]

use start_urls as the url for each request made by start_request method

def start_request(self):
    request = Request(url = self.start_urls, callback=self.parse)
    yield request
Full example code – add your own selectors in the parse method!

Get the code on the Red and Green GitHub page https://github.com/RGGH/Scrapy18/blob/main/stackospider.py

This is also an answer to a question on stackoverflow:

https://stackoverflow.com/questions/67166565/how-can-the-start-urls-for-scrapy-be-imported-from-csv/67175549#67175549