Chat with your data #2: Agentic Application for querying databases using SQL

Introduction

As AI-powered conversational agents evolve, their integration with structured databases has become a crucial innovation. The ability to query databases using natural language simplifies data retrieval for non-technical users while enhancing productivity. This article explores a Python script that employs LangChain, OpenAI’s GPT models, Ollama, and SQLAlchemy to create a chatbot that interacts with an SQL database.

This guide breaks down the script, explaining its key components and showcasing its practical applications.

Overview of the Script

The script creates an interactive SQL agent that allows users to query a database of quality control results using natural language. It provides support for different LLM connectors (openai and ollama) and offers flexibility through command-line parameters.

Key features include:

Dynamic database setup using SQLite in-memory storage.
Integration with OpenAI’s ChatGPT or Ollama for conversational data retrieval.
SQL query generation through LangChain’s create_sql_agent.
Interactive chat interface to facilitate real-time querying.
Verbose logging and error handling for debugging and enhanced user experience.

Breaking Down the Code

1. Importing Required Libraries

from langchain.agents import create_sql_agent
from langchain.agents.agent_toolkits import SQLDatabaseToolkit
from langchain.agents.agent_types import AgentType
from langchain.utilities import SQLDatabase
from langchain_openai import ChatOpenAI
from langchain_community.llms import Ollama
from langchain.callbacks.tracers import ConsoleCallbackHandler
from sqlalchemy import MetaData, Column, Integer, String, Table, Date
from sqlalchemy import create_engine, insert
import pandas as pd
import argparse
from datetime import datetime

This script makes use of:

OpenAI’s GPT models and Ollama to process natural language inputs.
LangChain’s SQL integration for creating query-driven agents.
SQLAlchemy for database schema creation and manipulation.
Argparse to handle user-defined execution parameters.
Pandas for loading structured data from a CSV file.

2. ANSI Color Coding for Terminal Output

class bcolors:
    HEADER = '\033[95m'
    OKBLUE = '\033[94m'
    OKGREEN = '\033[92m'
    WARNING = '\033[93m'
    FAIL = '\033[91m'
    ENDC = '\033[0m'

This class defines colored outputs for terminal messages, improving readability.

3. Creating and Populating the SQL Database

def prepare_database(engine):
    metadata_obj = MetaData()

    table_qc = Table(
        "QualityChecks",
        metadata_obj,
        Column("ID", Integer, primary_key=True, autoincrement=True),
        Column("Article_Code", String(20), nullable=False),
        Column("Description", String(100), nullable=False),
        Column("Category", String(50), nullable=False),
        Column("Date_of_Check", Date, nullable=False),
        Column("Quantity_in_First_Class", Integer, nullable=False),
        Column("Quantity_in_Second_Class", Integer, nullable=False),
        Column("Quantity_Rejected", Integer, nullable=False),
    )

    metadata_obj.create_all(engine)

This function creates a table QualityChecks in an SQLite in-memory database, with fields for article codes, descriptions, categories, and inspection results.

    df = pd.read_csv("../datasets/quality_assurance.csv")
    for record in df.itertuples():
        date_of_check = datetime.strptime(record[4], '%Y-%m-%d')
        stmt = insert(table_qc).values(
            Article_Code=record[1],
            Description=record[2],
            Category=record[3],
            Date_of_Check=date_of_check,
            Quantity_in_First_Class=record[5],
            Quantity_in_Second_Class=record[6],
            Quantity_Rejected=record[7]
        )
        with engine.begin() as conn:
            conn.execute(stmt)

The script then loads quality assurance data from a CSV file and populates the database.

4. Command-Line Configuration

parser = argparse.ArgumentParser(description='Chat with an SQL agent')
parser.add_argument('-v', '--verbose', action='store_true', help='Enable verbose output')
parser.add_argument('-c', '--connector', type=str, required=False, default='openai', help='Connector to use: openai|ollama')
parser.add_argument('-m', '--model', type=str, required=False, default='', help='Model name to use')
parser.add_argument('-ou', '--ollamaurl', type=str, required=False, default='http://localhost:11434', help='Ollama url')
args = parser.parse_args()

These arguments allow users to specify:

-v enables verbose mode.
-c allows switching between OpenAI (openai) and locally hosted models (ollama).
-m specifies the model to use
-ou specifies the Ollama server URL

5. Setting Up the Agent

engine = create_engine("sqlite:///:memory:")
prepare_database(engine)
db = SQLDatabase(engine)

The SQLite in-memory database is initialized and populated.

agent = create_sql_agent(
    llm=model,
    toolkit=SQLDatabaseToolkit(db=db, llm=model),
    verbose=True,
    agent_type=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    max_iterations=50,
    handle_parsing_errors=True,
    return_intermediate_steps=True
)

An SQL agent is created using create_sql_agent, allowing natural language queries over the database.

6. Interactive Querying Loop

print("Chat with me (ctrl+D to quit)!")

while True:
    try:
        question = input("human: ")
        answer = agent.invoke(
            "system: " + system_msg + '\n' + "user:" + question,
            config=config
        )
        print("agent: ", answer['output'])
    except EOFError:
        print("\nGoodbye!")
        break
    except Exception as e:
        print(f"{bcolors.FAIL}{type(e)}")
        print(f"{bcolors.FAIL}{e.args}")
        print(f"{bcolors.FAIL}{e}")
        print(f"{bcolors.ENDC}")

This loop continuously processes user queries and returns database-driven responses.

Examples of usage

python qa_sql_agent.py
python qa_sql_agent.py -h
python qa_sql_agent.py -v
python qa_sql_agent.py -c openai
python qa_sql_agent.py -c openai -m gpt-4o-mini
python qa_sql_agent.py -c ollama
python qa_sql_agent.py -c ollama -m llama3.2 -v

Examples of questions

By launching the script, when the prompt “human:” appears you can make a natural language query on the pandas dataframe. Here are some examples of queries:

What is the percentage of rejected articles?
What is the percentage of rejected articles of Educational category?
What are the distinct categories?
What is the worst category?
What is the range of dates of checks?
Are there articles specialized for kids? If they are, what are them?
What is the total quantity for articles of the Education category?
What is the ration between first class and 2nd class for – Education category?
What is the most recent check for Educational category?
How many articles of category Vehicles have defects?

Conclusion

This script showcases how SQL databases can be queried using natural language via an agentic application. By leveraging LangChain, OpenAI, and SQLAlchemy, the application bridges the gap between structured data and conversational AI, making data retrieval more accessible and efficient. Whether used for business analytics, customer support, or data-driven decision-making, this tool provides an intuitive way to interact with SQL databases.

Download of the complete code

The complete code is available at GitHub.
These materials are distributed under MIT license; feel free to use, share, fork and adapt these materials as you see fit.
Also please feel free to submit pull-requests and bug-reports to this GitHub repository or contact me on my social media channels available on the contact page.