Query Rightmove with LangChain | OpenAI | Python | bs4
Welcome to the Langchain tools & OpenAI with bs4 Rightmove data tutorial! In this video, we explore how to use Langchain tools in combination with OpenAI and BeautifulSoup4 (bs4) to extract and analyze data from Rightmove. Webscraping with a twist. Stay tuned to the end and there is a Streamlit implementation as well.
Rightmove is one of the largest property websites in the United Kingdom, providing valuable real estate data. By utilizing the power of Langchain tools, OpenAI, and bs4, we can automate the process of collecting and processing this data, saving time and effort. Throughout this tutorial, we’ll cover the following topics:
1. Introduction to Langchain tools: We’ll provide an overview of Langchain tools, a powerful data extraction and analysis tool that simplifies the process of web scraping.
As of March 2023, LangChain included integrations with systems including Amazon, Google, and Microsoft Azure cloud storage; API wrappers for news, movie information, and weather; Bash for summarization, syntax and semantics checking, and execution of shell scripts; multiple web scraping subsystems and templates; few-shot learning prompt generation support; finding and summarizing "todo" tasks in code; ~ wikipedia
2. Understanding OpenAI: We’ll explore how OpenAI’s language model can be leveraged to generate relevant insights and analyze the collected data.
3. Overview of BeautifulSoup4 (bs4): We’ll introduce bs4, a Python library that makes it easy to scrape information from web pages. We’ll explain its key features and functionalities.
4. Collecting data from Rightmove: We’ll dive into the process of scraping property data from Rightmove using bs4 and Langchain tools. We’ll demonstrate how to extract information such as property prices.
https://youtu.be/4iiSPOhCBmw
import os
OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY")
from langchain.agents import initialize_agent, Tool
from langchain.chat_models import ChatOpenAI
from langchain.agents import tool
from langchain.prompts import PromptTemplate
import statistics
import requests
from bs4 import BeautifulSoup
Choose a model to use
llm = ChatOpenAI(model_name='gpt-3.5-turbo', temperature=0)
Create the custom tools in Python
@tool
def get_median_price(url=url) -> str:
"""Gets the median house price from the supplied Rightmove URL"""
# Get the page
headers = { 'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0' }
response = requests.get(url, headers=headers)
# Scrape the content
soup = BeautifulSoup(response.text, "html.parser")
props = soup.find_all("div", class_="l-searchResult is-list")
all_prices = []
for i in range(len(props)):
prop = props[i]
price = (
prop.find("div", class_="propertyCard-priceValue")
.get_text()
.replace(",","")
.strip("£")
.strip()
)
all_prices.append(int(price))
res = round(statistics.median(all_prices),2)
return (str(res))
@tool
def get_mean_price(url=url) -> str:
"""Gets the mean house price from the supplied Rightmove URL"""
# Get the page
headers = { 'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0' }
response = requests.get(url, headers=headers)
# Scrape the content
soup = BeautifulSoup(response.text, "html.parser")
apartments = soup.find_all("div", class_="l-searchResult is-list")
all_prices = []
for i in range(len(apartments)):
apartment_no = apartments[i]
price = (
apartment_no.find("div", class_="propertyCard-priceValue")
.get_text()
.replace(",","")
.strip("£")
.strip()
)
all_prices.append(int(price))
res = round(statistics.mean(all_prices),2)
return (str(res))
Use Agent with Prompt
# Create the agent to use the tool
# List the tools that AI can use
tools = [get_median_price, get_mean_price]
# Set up the Agent, give it the tools
agent = initialize_agent(tools,
llm,
agent="chat-zero-shot-react-description",
verbose=True)
# Create the prompt from a prompt template, with 1 input variable
prompt = PromptTemplate(input_variables = ['calctype'],
template = '''
You have been given access to search data from Rightmove.
Please calculate the {calctype} based on the response using the {calctype} tool.
'''
)
# Run the agent with the provided input variable
result = agent.run(prompt.format_prompt(calctype="mean"))
print(result)
https://github.com/RGGH/LangChain_Agent_bs4/blob/main/LangChain_Tools_Agents.ipynb