Scrapy : Yield
So here we see the code for “parse” with “yield” being used 3 times.
- Go and fetch the geo data
- Go and fill the container fields in items.py
- Now go and find the next page (120) listings
The 2nd ‘yield’ has no URL to go to but after every one of the “ads in all_ads” has had it’s values gathered and sent to items the for loop ends the pagination code checks for next page and the code goes on to get the next bunch of listings to process.
parse_detail
We’ve already covered what this does, it gets the “‘longitude” and “latitude” from the ‘detail’ page for the property. As we can see below, there is no ‘Yield’ required and self.lon and self.lat get their values on each and every iteration of ‘for ads in all_ads:’
main
Above we can see the class ‘RealestateSpdier’ being instantiated and then FEEDS being assigned a path and format.
crawl specifies the class to use, and start does what is says!
Conclusion
We hope this has been a useful explanation and example of using ‘Yield’ more than once in a method/fucntion.
You may see “return” used in some examples but from experience “yield” is more robust (where you have a choice).
‘FEEDS’ was a new way of saving the output, previously we’ve used “FEED_FORMAT” and “FEED_URI” – both ways work, but FEEDS seems to be the new way.
The YouTube video will show this in action and will appear here soon!
Thanks for reading! ✅