Web Scraping Articles

  • map & lambda - Introduction Using lambda can save you having to write a function. If you’ve not used ‘map’ then we’ll show you how it can perform the same task as lambda in an example import pandas as pdpd.set_option('max_rows',10)import numpy as npreviews = pd.read_csv("winemag-data-130k-v2.csv",index_col=0)reviews next we’ll drop any rows full of NaNs reviews.dropna() now we have good data… […]
  • Data Analysis With Pandas - If you want to learn about Data Analysis with Pandas and Python and you’re not familiar with Kaggle, check it out! Time to read article : 5 mins TLDR; We show how to use idxmax and apply with Pandas Introduction Here we will look at some functions in Pandas which will help with ‘EDA’ – […]
  • EBAY API – Python Code - If you have been looking for EBAY API – Python code then this article can help. Rather than use web scraping techniques you can access live EBAY listings via the API with an SDK for Python. Time to read this article about EBAY API – Python Code : 5 mins TLDR; Watch the ebay api […]
  • PostgreSQL - PostgreSQL is a free, powerful SQL database which is frequently used with Python How to connect to postgres sudo -i -u postgres run psql postgres@rag-laptop:~$ psql psql (12.6 (Ubuntu 12.6-0ubuntu0.20.04.1)) Type "help" for help. list databases postgres=# l List of databases Name | Owner | Encoding | Collate | Ctype | Access privileges -----------+----------+----------+-------------+-------------+----------------------- gis […]
  • Extract links with Scrapy - Using Scrapy’s LinkExtractor methiod you can get the links from every page that you desire. https://www.programcreek.com/python/example/106165/scrapy.linkextractors.LinkExtractor https://github.com/scrapy/scrapy/blob/2.5/docs/topics/link-extractors.rsthttps://github.com/scrapy/scrapy/blob/master/scrapy/linkextractors/lxmlhtml.py https://w3lib.readthedocs.io/en/latest/_modules/w3lib/url.html What are Link Extractors?     Link Extractors are the objects used for extracting links from web pages using scrapy.http.Response objects.     “A link extractor is an object that extracts links from responses.” Though Scrapy has built in extractors like  scrapy.linkextractors import LinkExtractor,  you can customize your own link extractor based on your needs by implementing a simple interface. The scrapy link extractor makes use of w3lib.url    Have a look at the source code for w3lib.url : https://w3lib.readthedocs.io/en/latest/_modules/w3lib/url.html # -*- coding: utf-8 -*- #+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ #|r|e|d|a|n|d|g|r|e|e|n|.|c|o|.|u|k| #+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ import scrapy from scrapy import Spider from scrapy import Request from scrapy.crawler import CrawlerProcess from scrapy.linkextractors import LinkExtractor import os class Ebayspider(Spider): name […]
  • Read Scrapy ‘start_urls’ from csv file - How can the start_urls for scrapy be imported from csv? Using a list comprehension and a csv file you can make Scrapy get specific URLs from a predefined list use the .strip() method to remove newline characters Here you can see the line.strip() is performing the removal: [line.strip() for line in file] Demonstration of how […]
  • How to web scrape iframes with scrapy - Web Scraping pages with iframes in can be done with Scrapy if you use a separate URL to access the data inside the iframe. You need to identify the name of the page of the iframe and then append that to your base url to provide a 2nd URL for the Scrapy spider to visit. […]
  • How to scrape iframes - If you are scraping a website with pop up messages asking you to agree to accept cookies. This can prevent your scraper from continuing to the pages you want to scrape. How do you get past these? Using Selenium you need to switch to the iframe (which you can identify using browser tools / inspect […]
  • Comparing values in SQL against previously scraped data - If you have scraped data on more than one occasion and want to check if a value has changed in a column since the previous scrape you could use this: We now know that since the last time we scraped the site, only one of our claims has been updated by “them” This has been […]
  • Parsing Scraped Data with Pandas - Once you have some data you’ll need to find the relevant parts and use them for further analysis. Rather than use the proverbial Excel sheet you can write code to automate the task. Consider the following : The following code will match where the row contains “Dijbouti” and return the value that is in the […]