LangChain can simplify and enhance working with a Large Language Model. I used OpenAI ChatGPT for the video as it’s currently the most widely used and easy to use when you only have a regular laptop computer. This video is split into the same chapters as the official LangChain documentation.
It demonstrates the Python code to use LangChain Models, Prompts, Chains, Memory, Indexes, Agents and Tools. The video also demonstrates using Qdrant as a vector database to enable retrieval of embedded vectors along with tips on how to debug LangChain and how to set up a project from scratch.
As well as the regular examples you my find on the LangChain documentation I also show how to create a video suggestion chatbot and in the ‘indexes’ chapter I show you how to create a full project to query your documents, ‘upserting’ data from a text document and then querying it.
Features of LangChain:
- Agents & Tools
1 – Try the OpenAI API without LangChain
- Go to http://platform.openai.com
- Sign in, and start testing prompts!
- Save your API key in .env file or ~/.zshrc !
- Add .env to gitignore
- I use a global gitignore file
pip install openai
ChatGPT API Transition Guide :
‘role’ can take one of three values:
Sign up to OpenAI if do you want to use them with your projects – requires credit card*
Tips to save $
Use caching + FakeLLM
Set a billing limit
View token usage in your code :
You can try free LLMs but they often need RAM == $$$
from langchain.callbacks import get_openai_callback
2 – Install LangChain
pip install langchain
LangChain automates LLM calls, choose whichever LLM you prefer
3 – Models
There are lots of LLM providers (OpenAI, Cohere, Hugging Face, etc) – the
LLM class is designed to provide a standard interface for all of them.
You can use
LLMs (see here) for chatbots as well, but chat models have a more conversational tone and natively support a message interface.
from langchain.llms import OpenAI from langchain.llms import Cohere from langchain.llms import GooseAI
Once you have imported you can create an instance of it
llm = OpenAI()
As of August 2023 – gpt-3.5-turbo is the default model for the OpenAI class if you don’t specify anything inside the brackets.
What is the difference between LLM and chat model in LangChain?
- LLMs: Models that take a text string as input and return a text string
- Chat models: Models that are backed by a language model but take a list of Chat Messages as input and return a Chat Message
Chat Models: Unlike LLMs, chat models take chat messages as inputs and return them as outputs.
gpt-3.5-turbo returns outputs with lower latency and costs much less per token
.predict and .run methods are usually the same!
4 – Prompts
TLDR; “Prompts are the text that you send to the LLM”
A prompt for a language model is a set of instructions or input provided by a user to guide the model’s response, helping it understand the context and generate relevant and coherent language-based output, such as answering questions, completing sentences, or engaging in a conversation.
It’s like a Python f-string…Use a prompt template and you can pass a dynamically formed question.
from langchain import PromptTemplate
Single shot v Few Shot
Prompts, Chains and Parser Basics
5 – Chains
SimpleSequentialChain – produces only 1 output
SequentialChain – output of 1st chain goes into next chain – *chain takes a dict!
from langchain.chains import ...
6 – Router Chains
The RouterChain itself (responsible for selecting the next chain to call)
MultiPromptChain to create a question-answering chain that selects the prompt which is most relevant for a given question. e.g. physics_template and maths_template
7 – Memory
By default, LLMs are stateless, which means each incoming query is processed independently of other interactions without memory, so every query is treated as an entirely independent input without considering past interactions.
Memory – Buffer vs. Summary
from langchain.memory import ChatMessageHistory
# Retrieve chat messages with ConversationBufferHistory (as a variable) from langchain.memory import ConversationBufferMemory
ConversationBufferMemory stores everything, but uses lots of tokens and response is slower.
from langchain.memory import ConversationBufferMemory # useful for keeping a sliding window of the most recent interactions, # so the buffer does not get too large
memory = ConversationBufferMemory() memory.chat_memory.add_user_message("hi!") memory.chat_memory.add_ai_message("whats up?")
ConversationSummaryMemory keeps a summarized form of the conversation.
“Progressively summarize the lines of conversation provided, adding onto the previous summary returning a new summary.”
conversation_sum = ConversationChain( llm=llm, memory=ConversationSummaryMemory(llm=llm) )
# extracts information on entities (using an LLM) and # builds up its knowledge about that entity over time (also using an LLM) from langchain.memory import ConversationEntityMemory
More memory options:
from langchain.chains.conversation.memory import (ConversationBufferMemory, ConversationSummaryMemory, ConversationBufferWindowMemory, ConversationKGMemory)
from pprint import pprint pprint(conversation.memory.entity_store.store)
- Document loaders
- Text splitters
- Vector stores – embed text as vectors (Similarity Search)
pip install langchain.vectorstores langchain.embeddings langchain.text_splitter qdrant_client numpy sentence_transformers tqdm
import langchain langchain.debug = True
python3.10 -i q1.py
Agents – LLMs plus Tools
Language models can run Python code! – tool = PythonREPL()
How ChatGPT Plugins work
Evaluation – Use LLMs to evaluate LLMs