Read Scrapy ‘start_urls’ from csv file
How can the start_urls for scrapy be imported from csv?
Using a list comprehension and a csv file you can make Scrapy get specific URLs from a predefined list
use the .strip() method to remove newline characters
Here you can see the line.strip() is performing the removal:
[line.strip() for line in file]
Demonstration of how to read a list of URLs from a CSV (and use in Scrapy)
with open('data.csv') as file:
start_urls = [line.strip() for line in file]
use start_urls as the url for each request made by start_request method
def start_request(self):
request = Request(url = self.start_urls, callback=self.parse)
yield request
Get the code on the Red and Green GitHub page https://github.com/RGGH/Scrapy18/blob/main/stackospider.py
This is also an answer to a question on stackoverflow: