Querying a database directly with natural language

No SQL, no hunting through Stack Overflow for the syntax you forgot—just type in what you want, and let the LLM handle your query.

We’ll use LangChain to hook up to a MySQL database with ChatGPT doing the heavy lifting.

Here’s the setup: I’ve got a MySQL database full of employee data—names, hire dates, that kind of stuff. Instead of writing out queries manually, I’m using LangChain to generate them automatically. The idea is simple: you ask a question, like “Who’s the second longest-serving employee?” LangChain takes that, feeds it to ChatGPT, and out comes an SQL query ready to run against the database. You can throw in questions like “How many employees were hired after 2015?” or “What’s the average salary by department?” and it just works despite it not having your actual data.

However, the best part is the flexibility. LangChain isn’t tied to any one database—so whether you’re using MySQL, Postgres, or something else, it’ll adapt. In my case, I’ve wrapped everything in Python, connecting securely to the database using SQLAlchemy. No exposed creds, no shortcuts—just a solid, production-ready workflow.

When you run a query, the results come back raw, since that’s how SQL works. But here’s where it gets clever: LangChain can rephrase the results into proper sentences. For instance, instead of returning “John Doe, 2012,” it says, “The second longest-serving employee is John Doe, hired in 2012.” It’s a small touch, but it makes a massive difference when you’re building interfaces for non-technical users—or even if you just want to speed up your debugging sessions.

employees database

Although this is impressive you will need to be mindful of how much schema you will need to send to the LLM, in particular OpenAI, cos tokens!

Example Workflow

If the user asks:

What is the name of the 2nd longest serving employee?

The output in the logs will include:

  1. Generated SQL query.
  2. Retrieved database rows.
  3. A rephrased response such as:The 2nd longest serving employee is John Doe.
import os
import logging
from sqlalchemy import create_engine, text
from sqlalchemy.orm import sessionmaker
from langchain_community.utilities.sql_database import SQLDatabase
from langchain.chains import create_sql_query_chain
from langchain_openai import ChatOpenAI  # type: ignore
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import PromptTemplate
from dotenv import load_dotenv
# Load environment variables
load_dotenv()

# Logging setup
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

# OpenAI API key
openai_api_key = os.getenv("OPENAI_API_KEY")
if not openai_api_key:
    logger.error("OpenAI API key is not set in environment variables.")
    exit(1)

Note, you don’t strictly need the logging setup (logging.basicConfig and logger) to display the output. Logging is there to help you debug and monitor your application, but it’s not required for the actual functionality of displaying results. You can leave it out if you like! It’s useful during development however.

The key part of all of the completed code is this , note the key : value pair?

generate query

The question is the first thing you add to main :

question

Here are all the key stages :

  • Load environment variables
  • Logging setup
  • OpenAI API key
  • Database connection details
  • Create SQLAlchemy engine and session
  • Initialize LangChain database
  • Initialize LLM and query chain
  • Generate SQL query from natural language
  • Execute SQL query
  • Prepare result in a string format for rephrasing
  • Prompt template for rephrasing
  • Main workflow

If you ask for the 3rd longest serving employee :

SELECT e.first_name, e.last_name
FROM employees e
ORDER BY e.hire_date ASC
LIMIT 2, 1;

Explanation:

  • ORDER BY e.hire_date ASC: Orders the employees by their hiring date, from the earliest (longest-serving) to the latest.
  • LIMIT 2, 1: Skips the first two rows and retrieves the next one (offset 2, fetch 1).

Why This Fetches the 3rd Longest-Serving Employee:

  1. The ORDER BY ASC ensures the list is sorted by hire date in ascending order.
  2. The LIMIT 2, 1 starts after the second row, effectively fetching the third.
result

Full code on GitHub