Scrapy response.meta

capture your start urls in your output with Scrapy response.meta

Every web scraping project has aspects that are different or interesting and worth remembering for future use.

This is a look at a recent real world project and looks saving more than one start url in the output.

This assumes basic knowledge of web scraping, and identifying selectors. See my other videos if you would like to learn more about selectors (xpath & css)

scraping a real estate site for houses and appartments

We want to fill all of the columns in our client’s master excel sheet.

We could* then provide them with a CSV which they can import and do with what they wish.

We want 1500+ properties so we will be using Scrapy and Python

Considerations

One of the required fields requires us to pass the particular start url all the way through to the CSV (use response.meta)

Some of the required values are inside text and will require parsing with re (use regular expressions) ¡We don’t care about being fast – edit “settings.py” with conservative values for concurrent connections, download delay

This is a German website so I will use Google Chrome browser and translate to English.