Skip to the content Skip to the Navigation

Red And Green

  • Python Code
  • Rust Programming
  • Bitcoin Programming
  • Contact

Scrapy

  1. HOME
  2. Scrapy
April 26, 2021 / Last updated : February 27, 2023 admin Python Code

Extract links with Scrapy

Using Scrapy’s LinkExtractor method you can get the links from every page that you desire. What are Link Extractors? “A link extractor is an object that extracts links from responses.” Summary The above code gets all of the hrefs very quickly and give you the flexibility to omit or include very specific attirbutes Watch the video Extract Links | how to scrape website urls | Python + Scrapy […]

February 15, 2021 / Last updated : February 15, 2021 admin Python Code

Scrapy response.meta

capture your start urls in your output with Scrapy response.meta Every web scraping project has aspects that are different or interesting and worth remembering for future use. This is a look at a recent real world project and looks saving more than one start url in the output. This assumes basic knowledge of web scraping, […]

November 11, 2020 / Last updated : November 11, 2020 admin Python Code

Price Tracking Amazon

A common task is to track competitors prices and use that information as a guide to the prices you can charge, or if you are buying, you can spot when a product is at a new lowest price. The purpose of this article is to describe how to web scrape Amazon. Using Python, Scrapy, MySQL, […]

October 17, 2020 / Last updated : October 17, 2020 admin Python Code

How To Web Scrape Amazon (successfully)

You may want to scrape Amazon for information about books about web scraping! We shorten what would have been a very very long selector, by using “contains” in our xpath : response.xpath(‘//*[contains(@class,”sg-col-20-of-24 s-result-item s-asin”)]’) The most important thing when starting to scrape is to establish what you want in your final output. Here are the […]

October 7, 2020 / Last updated : October 7, 2020 admin Python Code

Combine Scrapy with Selenium

A major disadvantage of Scrapy is that it can not handle dynamic websites (eg. ones that use JavaScript). If you need to get past a login that is proving impossible to get past, usually if the form data keeps changing, then you can use Selenium to get past the login screen and then pass the […]

July 13, 2020 / Last updated : February 23, 2023 admin Python Code

Configure a Raspberry Pi for web scraping

Introduction The task was to scrape over 50,000 records from a website and be gentle on the site being scraped. A Raspberry Pi Zero was chosen to do this as speed was not a significant issue, and in fact, being slower makes it ideal for web scraping when you want to be kind to the […]

July 10, 2020 / Last updated : July 10, 2020 admin Python Code

Scraping “LOAD MORE”

Do you need to scrape a page that is dynamically loading content as “infinite scroll” ? Using self.nxp +=1 the value passed to “pn=” in the URL gets incremented “pn=” is the query – in your spider it may be different, you can always use urllib.parse to split up the URL into it’s parts. Test […]

June 22, 2020 / Last updated : June 22, 2020 admin Python Code

Scrapy tips

Passing variables between functions using meta and cb_kwargs This will cover how to use callback with “meta” and the newer “cb_kwargs” The highlighted sections show how “logo_url” goes from parse to fetch_detail, where “yield” then sends it to the FEED export (output CSV file). When using ‘meta’ you need to use ‘meta.get’ on the ‘response’ […]

June 3, 2020 / Last updated : June 3, 2020 admin Python Code

Scrapy : Yield

Yes, you can use “Yield” more than once inside a method – we look at how this was useful when scraping a real estate / property section of Craigslist. Put simply, “yield” lets you run another function with Scrapy and then resume from where you “yielded”. To demonstrate this it is best show it with […]

Recent Posts

Protected: Build your first MCP server in Rust and test it with an LLM

May 6, 2025
OpenAPI & FastAPI Generation Automatically generate FastMCP servers from existing OpenAPI specifications (FastMCP.from_openapi()) or FastAPI applications (FastMCP.from_fastapi()), instantly bringing your web APIs to the MCP ecosystem.

Protected: Build an MCP Server from OpenAPI Spec

May 5, 2025
Schemars in Rust

Schemars in Rust: Beyond Basic Structs

May 1, 2025
podman v docker

Getting Started with Podman

April 30, 2025
as_ref

as_ref and Cow

April 30, 2025
what-are-resources

Protected: Dynamic Resources – MCP protocol

April 23, 2025

GitHub Workflows – Rust

April 21, 2025
rust-mcp-servers

Protected: Rust MCP Servers

April 18, 2025

Tokio Async in Rust: tokio::join! vs tokio::spawn()

March 28, 2025

Understanding await in Rust Async

March 27, 2025

Category

  • AI ML
  • automation
  • Bitcoin Programming
  • c
  • ebay api
  • email
  • JavaScript
  • LangChain
  • MySQL
  • Pandas
  • postgres
  • Python Code
  • Raspberry Pi
  • requests
  • Rust Programming
  • Scrapy
  • Selenium
  • Smart Contracts
  • Stellar
  • SurrealDB
  • Uncategorized
  • web scraping

Archive

  • May 2025
  • April 2025
  • March 2025
  • February 2025
  • January 2025
  • December 2024
  • November 2024
  • October 2024
  • September 2024
  • August 2024
  • July 2024
  • June 2024
  • May 2024
  • April 2024
  • March 2024
  • February 2024
  • January 2024
  • December 2023
  • November 2023
  • October 2023
  • August 2023
  • July 2023
  • June 2023
  • May 2023
  • April 2023
  • March 2023
  • February 2023
  • January 2023
  • November 2022
  • September 2022
  • November 2021
  • September 2021
  • August 2021
  • May 2021
  • April 2021
  • March 2021
  • February 2021
  • January 2021
  • December 2020
  • November 2020
  • October 2020
  • September 2020
  • July 2020
  • June 2020
  • April 2020
  • March 2020

Actix-web AI API async Automation axum BDK bitcoin Bitcoin Core blockchain class Closure closures Design Patterns dictionaries GitHub huggingface iced impl langchain LLM MCP ModelContextProtocol mutable Networks Podman python Qdrant raspberry pi Rust Programming scrapy Smart Contracts Sphinx Stellar struct SurrealDB Upsert VectorDatabase Vectors WASM webscraping web scraping Word DOCX xpath Yaml

  • email
  • github
  • YouTube
This site is hosted with https://webdock.io/en Fast Cloud VPS Hosting Flat fee all-inclusive VPS with a Free Control Panel

Copyright © Red And Green All Rights Reserved.

Powered by WordPress & Lightning Theme by Vektor,Inc. technology.

MENU
  • Python Code
  • Rust Programming
  • Bitcoin Programming
  • Contact
Translate ยป