Python Code

July 10, 2020 / Last updated : July 10, 2020 admin Python Code

Scraping “LOAD MORE”

Do you need to scrape a page that is dynamically loading content as “infinite scroll” ?

Scrapy Load More - infinite scroll - more results — If you need to scrape a site like this then you can increment the URL within your Scrapy code

Using self.nxp +=1 the value passed to “pn=” in the URL gets incremented
“pn=” is the query – in your spider it may be different, you can always use urllib.parse to split up the URL into it’s parts.

Test in scrapy shell if you are checking the URL for next page – see if you get response 200 and then check the response.text

What if you don’t know how many pages there are?

One way would be to use try/except – but a more elegant solution would be to check the source for “next” or “has_next” and keep going to next page until “next” is not true.

https://github.com/RGGH/Scrapy6/blob/master/AJAX%20example/foodcom.py

If you look at line 51 – you can see how we did that.

if response.xpath("//link/@rel='next\'").get() == "1":

See our video where we did just this : https://youtu.be/07FYDHTV73Y

Conclusion

We’ve shown how to deal with “infinite scroll” without resorting to selenium, splash, or any javascript rendering. Also, check in developer tools, “network” and “XHR” if you can find any mention of API in the URL – this may be useful also.

Categories: Python Code and Scrapy

Tags: load more more results

Python Code

July 2, 2020

Python Code

July 13, 2020

Scraping “LOAD MORE”

Do you need to scrape a page that is dynamically loading content as “infinite scroll” ?

What if you don’t know how many pages there are?

Conclusion

Extracting JSON from JavaScript in a web page

Configure a Raspberry Pi for web scraping