Skip to content

Natural-language-to-SQL analytics agent for B2B payments. Ask questions in English or Spanish, get back SQL queries, raw data, and business summaries. Includes FastAPI webhook for n8n/Make integration. Claude API stubs included for easy LLM integration.

Notifications You must be signed in to change notification settings

daniel-st3/python3-payments_nl_sql_agent

Repository files navigation

B2B Payments AI Assistant

A portfolio-ready, locally-runnable Natural-Language-to-SQL analytics agent built for a B2B payments fintech context (LatAm focus).

Ask a question in plain English or Spanish. The system generates SQL, executes it against a real SQLite database, and returns a business-friendly answer — all through a clean REST API that plugs directly into n8n or Make.


Project structure

.
├── generate_payments_data.py   # Step 1 – DB schema + 12 months of synthetic data
├── payments_nl_sql_agent.py    # Step 2 – NL-to-SQL pipeline (LLM stubs included)
├── payments_agent_api.py       # Step 3 – FastAPI webhook interface
├── requirements.txt            # Python dependencies
└── README.md                   # This file

Quick start

1 — Set up the environment

# (Optional but recommended) create a virtual environment
python3 -m venv .venv
source .venv/bin/activate        # Windows: .venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

2 — Generate the database

python3 generate_payments_data.py

Expected output:

Creating payments_demo.db …

[1/3] Creating schema …
  ✓ Tables created (clients, payments, payment_errors)

[2/3] Generating clients …
  ✓ Inserted 45 clients  (SMB=20, Mid-Market=15, Enterprise=10)

[3/3] Generating payments and errors …
  ✓ Inserted 12,000 payments  (~10,200 completed, ~1,200 pending, ~600 failed)
  ✓ Inserted 250 payment_errors

════════════════════════════════════════════════════
  Database generation complete!
  Total clients        :       45
  Total payments       :   12,000
  Total payment_errors :      250
  DB file              : payments_demo.db
════════════════════════════════════════════════════

3 — Run example questions (CLI)

python3 payments_nl_sql_agent.py

This runs three demo questions and prints SQL + rows + summary for each. The LLM calls are stubbed — you get real SQL and real data, but the summary is a placeholder until you wire in the Claude API (see below).

4 — Start the API server

python3 -m uvicorn payments_agent_api:app --reload

Then open http://localhost:8000/docs for the interactive Swagger UI.

5 — Call the API

curl -X POST http://localhost:8000/ask \
     -H "Content-Type: application/json" \
     -d '{"question": "Which clients have the highest number of failed payments?"}'

Response:

{
  "question": "Which clients have the highest number of failed payments?",
  "sql": "SELECT c.client_name, c.segment, ...",
  "rows": [["Grupo Andina", "Enterprise", "CO", 12, 980432.5], ...],
  "summary": "The top client by failed payments is Grupo Andina (Enterprise, Colombia)..."
}

Plugging in Claude (real LLM calls)

  1. Install the Anthropic SDK:

    pip install anthropic
  2. Set your API key:

    export ANTHROPIC_API_KEY="sk-ant-..."
  3. In payments_nl_sql_agent.py, replace the bodies of two functions:

    call_llm_for_sql — search for # TODO: Replace and swap in:

    import anthropic
    _claude = anthropic.Anthropic()   # reads ANTHROPIC_API_KEY from env
    
    response = _claude.messages.create(
        model="claude-opus-4-6",
        max_tokens=512,
        messages=[{"role": "user", "content": prompt}],
    )
    return response.content[0].text.strip()

    call_llm_for_summary — same pattern, max_tokens=256.

That's it. No other changes needed.


Suggested requirements.txt

fastapi>=0.111.0
uvicorn[standard]>=0.29.0
pydantic>=2.7.0
anthropic>=0.28.0        # uncomment when you have an API key

Architecture & design decisions

  • Schema → synthetic data → NL-to-SQL → query execution → LLM summary. Each stage is a pure function with a clear input/output contract, making individual components easy to test, swap, or scale independently.

  • SQLite for local portability. The entire stack runs with zero infrastructure. Swapping to PostgreSQL or BigQuery requires only changing the connection string inside run_sql_query. The schema and agent logic are database-agnostic.

  • Guardrail layer between the LLM and the database. All generated SQL is validated before execution: it must start with SELECT and must not contain any DDL or DML mutation keyword. This prevents prompt injection attacks from corrupting or exfiltrating data — a critical concern in a financial context.

  • Schema description as the LLM's "map". Rather than giving the model raw PRAGMA output, get_schema_description() builds a curated, human-readable schema string with domain notes (segment values, currency codes, date format). Better context → better SQL.

  • Relevance to B2B payments fintechs (Cobre, dLocal, Kushki, etc.). Non-technical stakeholders (sales, finance, ops) need to query payment volumes, reconcile failed transactions, and track client health — without writing SQL. This assistant gives them self-serve analytics with full auditability (the SQL is always returned alongside the answer).

  • Auditability by design. Every API response includes the raw SQL and the raw rows alongside the summary. Compliance and audit teams can verify the answer was derived correctly — essential in regulated financial environments.

  • n8n / Make integration. The /ask endpoint is a standard HTTP POST that any no-code tool can call. Practical automations include:

    • Scheduled reports: an n8n Schedule Trigger calls /ask every Monday with "What was last week's total volume by country?" and emails the summary.
    • Slack alerts: a Make scenario calls /ask, checks if failed-payment count exceeds a threshold, and posts an alert to a Slack channel.
    • Dashboard webhooks: front-end apps call /ask directly for on-demand NL queries embedded in a BI dashboard.
  • Balancing rules vs. AI. SQL generation is delegated to the LLM (flexible, multilingual, handles nuanced phrasing); security and correctness enforcement is done with deterministic regex guardrails (fast, auditable, never hallucinated). Neither layer does the other's job.


Interview talking points

Topic Talking point
Problem solved Non-technical B2B teams need payment insights without SQL skills. NL-to-SQL bridges that gap with full transparency (SQL is always returned for auditability).
Schema design Three normalized tables — clients, payments, payment_errors — mirror real fintech data models. FKs enforce referential integrity; status/direction/method columns enable slicing by every business dimension.
Data quality Synthetic data uses log-uniform distributions and segment-weighted transaction assignment, so aggregate queries return realistic, non-trivial patterns rather than uniform noise.
Security SQL guardrails (SELECT-only, forbidden keyword list) prevent prompt injection. In production these would be complemented by DB-level read-only credentials.
Observability Every API response includes the SQL and raw rows. The X-Process-Time header tracks latency. Logging is structured and consistent across all modules.
Extensibility Swapping SQLite → Postgres, or the LLM stub → Claude → GPT-4, requires changing one function. Adding a new endpoint (e.g., /compare for period-over-period) follows the same pattern.
n8n/Make fit The /health probe, structured JSON errors, and CORS headers were added specifically so no-code tools can integrate without friction.

About

Natural-language-to-SQL analytics agent for B2B payments. Ask questions in English or Spanish, get back SQL queries, raw data, and business summaries. Includes FastAPI webhook for n8n/Make integration. Claude API stubs included for easy LLM integration.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages