Chat with your data #1: Agentic Application for querying pandas DataFrames

Introduction

With the rise of large language models (LLMs) and their integration into various domains, one of the most exciting use cases is enabling natural language interactions with structured data. This article explores a Python script that leverages LangChain, OpenAI’s GPT models or Ollama to create a chatbot capable of querying a pandas DataFrame interactively.

In this breakdown, we’ll explore the core functionalities of the script, how it works, and its potential applications.

Overview of the Script

The provided Python script creates an interactive agent that allows users to chat with a pandas DataFrame containing quality assurance data. It provides flexibility by supporting different LLM connectors (openai or ollama) via command-line arguments.

Key components of the script include:

Loading and handling structured data using pandas.
Integration with OpenAI’s ChatGPT or Ollama for natural language querying.
Interactive console-based chat interface to allow dynamic data interaction.
Verbose logging and error handling to enhance debugging and user experience.

Breaking Down the Code

1. Importing Necessary Libraries

from langchain_experimental.agents.agent_toolkits import create_pandas_dataframe_agent
from langchain_openai import ChatOpenAI
from langchain_community.llms import Ollama
from langchain.callbacks.tracers import ConsoleCallbackHandler
import pandas as pd
import argparse

The script starts by importing key dependencies. Notably, it includes:

LangChain’s create_pandas_dataframe_agent, which allows the agent to process pandas DataFrames efficiently.
ChatOpenAI and Ollama, which facilitate connections to OpenAI models and local LLMs, respectively.
Pandas, which is used for handling structured data.
Argparse, which is used to allow user customization via command-line arguments.

2. Defining ANSI Color Codes for Output Styling

class bcolors:
    HEADER = '\033[95m'
    OKBLUE = '\033[94m'
    OKGREEN = '\033[92m'
    WARNING = '\033[93m'
    FAIL = '\033[91m'
    ENDC = '\033[0m'

These ANSI escape sequences enable color-coded output in the terminal, making logs and error messages more readable.

3. Loading the Dataset

The script loads a CSV file containing quality assurance data. This file serves as the structured dataset that users will query via natural language.

df = pd.read_csv("../datasets/quality_assurance.csv")

The script loads a CSV file containing quality assurance data. This file serves as the structured dataset that users will query via natural language.

4. Configuring Command-Line Arguments

parser = argparse.ArgumentParser(description='Chat with a pandas agent')
parser.add_argument('-v', '--verbose', action='store_true', help='Enable verbose output')
parser.add_argument('-c', '--connector', type=str, required=False, default='openai', help='Connector to use: openai|ollama')
parser.add_argument('-m', '--model', type=str, required=False, default='', help='Model name to use')
parser.add_argument('-ou', '--ollamaurl', type=str, required=False, default='http://localhost:11434', help='Ollama url')
args = parser.parse_args()

The script allows customization via command-line arguments:

-v enables verbose mode.
-c allows switching between OpenAI (openai) and locally hosted models (ollama).
-m specifies the model to use.
-ou specifies the Ollama server URL

5. Initializing the LLM Connector

if args.connector == 'openai':
    model = ChatOpenAI(model=args.model, temperature=0.)
elif args.connector == 'ollama':
    ollama_url = "http://localhost:11434"
    model = Ollama(base_url=ollama_url, model=args.model, temperature=0.)
else:
    raise ValueError(f"Unsupported connector: {args.connector}. Use 'openai' or 'ollama'")

Depending on the user’s choice, the script initializes either an OpenAI or Ollama model instance for query processing.

6. Creating the DataFrame Agent

agent = create_pandas_dataframe_agent(
    model, 
    df, 
    verbose=True if args.verbose else False, 
    agent_executor_kwargs={"handle_parsing_errors": True}, 
    allow_dangerous_code=True,
    max_iterations=50
)

The create_pandas_dataframe_agent function creates an interactive agent that processes the loaded DataFrame using the specified LLM. Notable parameters:

handle_parsing_errors=True ensures that any parsing errors are handled gracefully.
allow_dangerous_code=True permits execution of Python code in user queries (potential security risk if used improperly).
max_iterations=50 limits the number of iterative refinements the agent makes to answer complex queries.

7. Setting Up Configuration for Verbose Logging

config = {}
if args.verbose:
    config = config | {'callbacks': [ConsoleCallbackHandler()]}

When verbose mode is enabled, the agent logs its internal processes to the console.

8. Interactive Chat Loop

print("Chat with me (ctrl+D to quit)!\n")

while True:
    try:
        question = input("human: ")
        answer = agent.invoke(
            question,
            config=config
        )
        print("agent: ", answer['output'])
    except EOFError:
        print("\nGoodbye!")
        break
    except Exception as e:
        print(f"{bcolors.FAIL}{type(e)}")
        print(f"{bcolors.FAIL}{e.args}")
        print(f"{bcolors.FAIL}{e}")
        print(f"{bcolors.ENDC}")

This loop continuously takes user queries, processes them using the agent, and returns responses until the user exits (Ctrl+D). Errors are caught and displayed in red for visibility.

Examples of usage

python qa_pandas_agent.py
python qa_pandas_agent.py -h
python qa_pandas_agent.py -v
python qa_pandas_agent.py -c openai
python qa_pandas_agent.py -c openai -m gpt-4o-mini
python qa_pandas_agent.py -c ollama
python qa_pandas_agent.py -c ollama -m llama3.2 -v

Examples of questions

By launching the script, when the prompt “human:” appears you can make a natural language query on the pandas dataframe. Here are some examples of queries:

What is the percentage of rejected articles?
What is the percentage of rejected articles of Educational category?
What are the distinct categories?
What is the worst category?
What is the range of dates of checks?
Are there articles specialized for kids? If they are, what are them?
What is the total quantity for articles of the Education category?
What is the ration between first class and 2nd class for – Education category?
What is the most recent check for Educational category?
How many articles of category Vehicles have defects?

Conclusion

This script exemplifies how LLMs can enable natural language interactions with structured data, making data analysis more intuitive and accessible. Whether for business intelligence, data science, or education, this agentic application streamlines querying pandas DataFrames without needing traditional coding skills.

Download of the complete code

The complete code is available at GitHub.
These materials are distributed under MIT license; feel free to use, share, fork and adapt these materials as you see fit.
Also please feel free to submit pull-requests and bug-reports to this GitHub repository or contact me on my social media channels available on the contact page.