Yes, you can use “Yield” more than once inside a method – we look at how this was useful when scraping a real estate / property section of Craigslist.
Put simply, “yield” lets you run another function with Scrapy and then resume from where you “yielded”.
To demonstrate this it is best show it with a working example, and then you’ll see the reason for using it.
Source code for this Scrapy Project
The difference with this project was that most of the usual ‘details‘ were actually on the ‘thumbnails’ / ‘listing’ page, with the exception of the geo data. (Longitude and Latitude).
So you could say this was a back-to-front website. Typically all the details would be extracted from the details page, accessed via a listings page.
Because we want to pass data (“Lon” and “Lat”) between 2 functions (methods) – we had to initialise these variables:
def __init__(self): self.lat ="" self.lon = ""
Next, the typical ‘parse’ code that identifies all of the ads (adverts) on the page – class name = “result-info”.
You could use either:
all_ads = response.xpath('//p[@class="result-info"]')
all_ads = response.css("p.result-info")
( XPATH or CSS – both get the same result, to use as the Selector )
We coded this, but it would run even if we hadn’t, it’s the default scrapy method that gets the first URL and passes the output “response” to the next method : ‘parse’.
This is the method that finds all of the adverts on page 1, and goes off to the details page and extracts the geo data.
Next it fills the scrapy fields in items.py with the data from the thumbnail listing for the property on the listings page and the geo data.
So the reason we described this as a back-to-front website is that the majority of the details come from the thumbnails/listing, and only 2 bits of data (“lon” and “lat”) come from the ‘details’ page.