Categories
Python Code Scrapy

Scraping “LOAD MORE”

Do you need to scrape a page that is dynamically loading content as “infinite scroll” ?

Scrapy Load More - infinite scroll - more results
If you need to scrape a site like this then you can increment the URL within your Scrapy code

Using self.nxp +=1 the value passed to “pn=” in the URL gets incremented

“pn=” is the query – in your spider it may be different, you can always use urllib.parse to split up the URL into it’s parts.

Test in scrapy shell if you are checking the URL for next page – see if you get response 200 and then check the response.text

What if you don’t know how many pages there are?

One way would be to use try/except – but a more elegant solution would be to check the source for “next” or “has_next” and keep going to next page until “next” is not true.

https://github.com/RGGH/Scrapy6/blob/master/AJAX%20example/foodcom.py

If you look at line 51 – you can see how we did that.

if response.xpath("//link/@rel='next\'").get() == "1":

See our video where we did just this : https://youtu.be/07FYDHTV73Y

Conclusion

We’ve shown how to deal with “infinite scroll” without resorting to selenium, splash, or any javascript rendering. Also, check in developer tools, “network” and “XHR” if you can find any mention of API in the URL – this may be useful also.