Scrapy : Yield

code for "parse" with "yield" being used 3 times

So here we see the code for “parse” with “yield” being used 3 times.

  • Go and fetch the geo data
  • Go and fill the container fields in items.py
  • Now go and find the next page (120) listings

The 2nd ‘yield’ has no URL to go to but after every one of the “ads in all_ads” has had it’s values gathered and sent to items the for loop ends the pagination code checks for next page and the code goes on to get the next bunch of listings to process.

parse_detail

We’ve already covered what this does, it gets the “‘longitude” and “latitude” from the ‘detail’ page for the property. As we can see below, there is no ‘Yield’ required and self.lon and self.lat get their values on each and every iteration of ‘for ads in all_ads:’

scrapy-parse_detail
Once the 2 variables have been assigned values the method ends and ‘parse’ resumes down to the ItemLoader.

main

Scrapy-FEEDS

Above we can see the class ‘RealestateSpdier’ being instantiated and then FEEDS being assigned a path and format.

crawl specifies the class to use, and start does what is says!

Conclusion

We hope this has been a useful explanation and example of using ‘Yield’ more than once in a method/fucntion.

You may see “return” used in some examples but from experience “yield” is more robust (where you have a choice).

‘FEEDS’ was a new way of saving the output, previously we’ve used “FEED_FORMAT” and “FEED_URI” – both ways work, but FEEDS seems to be the new way.

The YouTube video will show this in action and will appear here soon!

Thanks for reading! ✅

Previous article

Nested Dictionaries

Next article

Scrapy tips