LangChain with Python | Full Tutorial & Video

LangChain can simplify and enhance working with a Large Language Model. I used OpenAI ChatGPT for the video as it’s currently the most widely used and easy to use when you only have a regular laptop computer. This video is split into the same chapters as the official LangChain documentation.

It demonstrates the Python code to use LangChain Models, Prompts, Chains, Memory, Indexes, Agents and Tools. The video also demonstrates using Qdrant as a vector database to enable retrieval of embedded vectors along with tips on how to debug LangChain and how to set up a project from scratch.

As well as the regular examples you my find on the LangChain documentation I also show how to create a video suggestion chatbot and in the ‘indexes’ chapter I show you how to create a full project to query your documents, ‘upserting’ data from a text document and then querying it.

LangChain is available for Python and JavaScript

If you want to create a GUI then try “Streamlit” in Python, or sign up for or try

Features of LangChain:

  • Models
  • Prompts
  • Chains
  • Memory
  • Indexes
  • Agents & Tools

Use Cases:

1 – Try the OpenAI API without LangChain

export OPENAI_API_KEY=sk-xsssdfsddf67sadfasdfOXT3BlbkFJo12Ssdafdfasfadsfsafas
  • Add .env to gitignore
  • I use a global gitignore file
pip install openai

ChatGPT API Transition Guide :

‘role’ can take one of three values:




Sign up to OpenAI if do you want to use them with your projects – requires credit card*

Tips to save $

Use caching + FakeLLM

Set a billing limit

View token usage in your code :

You can try free LLMs but they often need RAM == $$$

from langchain.callbacks import get_openai_callback

2 – Install LangChain

Install LangChain:

pip install langchain

LangChain automates LLM calls, choose whichever LLM you prefer

3 – Models

There are lots of LLM providers (OpenAI, Cohere, Hugging Face, etc) – the LLM class is designed to provide a standard interface for all of them.

You can use LLMs (see here) for chatbots as well, but chat models have a more conversational tone and natively support a message interface.

from langchain.llms import OpenAI
from langchain.llms import Cohere
from langchain.llms import GooseAI

Once you have imported you can create an instance of it

llm = OpenAI()

As of August 2023 – gpt-3.5-turbo is the default model for the OpenAI class if you don’t specify anything inside the brackets.

What is the difference between LLM and chat model in LangChain?

  • LLMs: Models that take a text string as input and return a text string
  • Chat models: Models that are backed by a language model but take a list of Chat Messages as input and return a Chat Message

Chat Models: Unlike LLMs, chat models take chat messages as inputs and return them as outputs.

gpt-3.5-turbo returns outputs with lower latency and costs much less per token

.predict and .run methods are usually the same!

4 – Prompts

TLDR; “Prompts are the text that you send to the LLM”

A prompt for a language model is a set of instructions or input provided by a user to guide the model’s response, helping it understand the context and generate relevant and coherent language-based output, such as answering questions, completing sentences, or engaging in a conversation.

It’s like a Python f-string…Use a prompt template and you can pass a dynamically formed question.

from langchain import PromptTemplate



Single shot v Few Shot

Prompts, Chains and Parser Basics

5 – Chains

SimpleSequentialChain – produces only 1 output

SequentialChain – output of 1st chain goes into next chain – *chain takes a dict!

from langchain.chains import ...

6 – Router Chains

The RouterChain itself (responsible for selecting the next chain to call)

Use MultiPromptChain to create a question-answering chain that selects the prompt which is most relevant for a given question. e.g. physics_template and maths_template

7 – Memory

By default, LLMs are stateless, which means each incoming query is processed independently of other interactions without memory, so every query is treated as an entirely independent input without considering past interactions.

Memory – Buffer vs. Summary

from langchain.memory import ChatMessageHistory
# Retrieve chat messages with ConversationBufferHistory (as a variable)
from langchain.memory import ConversationBufferMemory

ConversationBufferMemory stores everything, but uses lots of tokens and response is slower.

from langchain.memory import ConversationBufferMemory
# useful for keeping a sliding window of the most recent interactions, 
# so the buffer does not get too large
memory = ConversationBufferMemory()
memory.chat_memory.add_ai_message("whats up?")

ConversationSummaryMemory keeps a summarized form of the conversation.

“Progressively summarize the lines of conversation provided, adding onto the previous summary returning a new summary.”

conversation_sum = ConversationChain(
# extracts information on entities (using an LLM) and 
# builds up its knowledge about that entity over time (also using an LLM)
from langchain.memory import ConversationEntityMemory

More memory options:

from langchain.chains.conversation.memory import (ConversationBufferMemory, 

View the Memory Store

from pprint import pprint


  • Document loaders
  • Text splitters
  • Retrievers
  • Vector stores – embed text as vectors (Similarity Search)

pip install langchain.vectorstores langchain.embeddings 
langchain.text_splitter qdrant_client numpy sentence_transformers tqdm
import langchain
langchain.debug = True
python3.10 -i

Agents – LLMs plus Tools

Language models can run Python code! – tool = PythonREPL()

How ChatGPT Plugins work

Evaluation – Use LLMs to evaluate LLMs