Python Code

HOME
Python Code
Extracting JSON from JavaScript in a web page

July 2, 2020 / Last updated : July 2, 2020 admin Python Code

Extracting JSON from JavaScript in a web page

Why would you want to do that?

Well, if you are web scraping using Python, and Scrapy for instance, you may need to extract reviews, or comments that are loaded from JavaScript. This would mean you could not use your css or xpath selectors like you can with regular html.

Parse

Instead, in your browser, check if you may be able to parse the code, beginning with ctrl + f, and “json” and track down some JSON in the form of a python dictionary. You ‘just’ need to isolate it.

web-scraping javascript pages — view-source to find occurrences of “JSON” in your page

The response is not nice, but you can gradually shrink it down, in Scrapy shell or python shell…

scrapy-shell-response — Figure 1 – The response

Split, strip, replace

From within Scrapy, or your own Python code you can split, strip, and replace, with the built-in python commands until you have just a dictionary that you can use with json.loads.

x = response.text.split('JSON.parse')[3].replace("\u0022","\"").replace("\u2019m","'").lstrip("(").split(" ")[0].strip().replace("\"","",1).replace("\");","")

Master replace, strip , and split and you won’t need regular expressions!

With the response.text now ready as a JSON friendly dictionary you can do this:

import json
q = json.loads(x)
comment = (q[‘doctor’][‘sample_rating_comment’])
comment.replace(“\u2019″,”‘”)
print(comment)

The key thing to remember to use when parsing the response text is to use the index, to pick out the section you want, and then make use of “\” backslash to escaped characters when you are working with quotes, and actual backslashes in the text you’re parsing.

parsed-response — Figure 2 – The parsed response

Conclusion

Rendering to HTML using Splash, or Selenium, or using regular expressions are not always essential. Hope this helps illustrate how you can extract values FROM a python dictionary FROM json FROM javascript !

You may see a mass of text on your screen to begin with, but persevere and you can arrive at the dictionary contained within…

Demo of getting a Python Dictionary from JSON from JavaScript

Categories: Python Code

Tags: dictionary from json javascript in scrapy json json from javascript scrapy webscraping

Python Code

June 22, 2020

Python Code

July 10, 2020

Extracting JSON from JavaScript in a web page

Parse

Split, strip, replace

Conclusion

Scrapy tips

Scraping “LOAD MORE”