构建 SQL 智能体

概览

在本教程中，您将学习如何使用 LangChain 智能体构建一个可以回答有关 SQL 数据库问题的智能体。从宏观层面来看，该智能体将：

从数据库中获取可用表和 schema

决定哪些表与问题相关

获取相关表的 schema

根据问题和 schema 信息生成查询

使用 LLM 再次检查查询是否存在常见错误

执行查询并返回结果

纠正数据库引擎发现的错误，直到查询成功

根据结果 формулировать 响应

构建 SQL 数据库的问答系统需要执行模型生成的 SQL 查询。这样做存在固有的风险。请确保您的数据库连接权限始终尽可能地窄，以满足代理的需求。这将减轻（但不能消除）构建模型驱动系统的风险。

概念

我们将涵盖以下概念

工具用于从 SQL 数据库读取数据
LangChain 智能体
人机协作流程

设置

安装

npm i langchain @langchain/core typeorm sqlite3 zod

LangSmith

设置 LangSmith 来检查您的链或代理内部发生了什么。然后设置以下环境变量：

export LANGSMITH_TRACING="true"
export LANGSMITH_API_KEY="..."

1. 选择一个大型语言模型 (LLM)

选择支持工具调用的模型

OpenAI
Anthropic
Azure
Google Gemini
Bedrock Converse

👉 阅读OpenAI 聊天模型集成文档

npm install @langchain/openai

import { initChatModel } from "langchain";

process.env.OPENAI_API_KEY = "your-api-key";

const model = await initChatModel("gpt-4.1");

下面示例中显示的输出使用了 OpenAI。

2. 配置数据库

您将为本教程创建一个 SQLite 数据库。SQLite 是一个轻量级数据库，易于设置和使用。我们将加载 chinook 数据库，它是一个代表数字媒体商店的示例数据库。为方便起见，我们已将数据库 (Chinook.db) 托管在公共 GCS 存储桶中。

import fs from "node:fs/promises";
import path from "node:path";

const url = "https://storage.googleapis.com/benchmarks-artifacts/chinook/Chinook.db";
const localPath = path.resolve("Chinook.db");

async function resolveDbPath() {
  if (await fs.exists(localPath)) {
    return localPath;
  }
  const resp = await fetch(url);
  if (!resp.ok) throw new Error(`Failed to download DB. Status code: ${resp.status}`);
  const buf = Buffer.from(await resp.arrayBuffer());
  await fs.writeFile(localPath, buf);
  return localPath;
}

3. 添加用于数据库交互的工具

使用 langchain/sql_db 中提供的 SqlDatabase 包装器与数据库交互。该包装器提供了一个简单的接口来执行 SQL 查询并获取结果。

import { SqlDatabase } from "@langchain/classic/sql_db";
import { DataSource } from "typeorm";

let db: SqlDatabase | undefined;
async function getDb() {
  if (!db) {
    const dbPath = await resolveDbFile();
    const datasource = new DataSource({ type: "sqlite", database: dbPath });
    db = await SqlDatabase.fromDataSourceParams({ appDataSource: datasource });
  }
  return db;
}

async function getSchema() {
  const db = await getDb();
  return await db.getTableInfo();
}

6. 实现人工审核（human-in-the-loop review）

在执行智能体的 SQL 查询之前，最好检查是否存在任何意外操作或低效率。 LangChain 智能体支持内置的人机协作中间件，以监督智能体工具调用。让我们配置智能体，在调用 sql_db_query 工具时暂停以进行人工审查：

from langchain.agents import create_agent
from langchain.agents.middleware import HumanInTheLoopMiddleware 
from langgraph.checkpoint.memory import InMemorySaver 


agent = create_agent(
    model,
    tools,
    system_prompt=system_prompt,
    middleware=[ 
        HumanInTheLoopMiddleware( 
            interrupt_on={"sql_db_query": True}, 
            description_prefix="Tool execution pending approval", 
        ), 
    ], 
    checkpointer=InMemorySaver(), 
)

我们为智能体添加了一个检查点，以允许暂停和恢复执行。有关此以及可用中间件配置的详细信息，请参阅人机协作指南。

运行智能体时，它现在将在执行 sql_db_query 工具之前暂停以进行审查

question = "Which genre on average has the longest tracks?"
config = {"configurable": {"thread_id": "1"}} 

for step in agent.stream(
    {"messages": [{"role": "user", "content": question}]},
    config, 
    stream_mode="values",
):
    if "messages" in step:
        step["messages"][-1].pretty_print()
    elif "__interrupt__" in step: 
        print("INTERRUPTED:") 
        interrupt = step["__interrupt__"][0] 
        for request in interrupt.value["action_requests"]: 
            print(request["description"]) 
    else:
        pass

...

INTERRUPTED:
Tool execution pending approval

Tool: sql_db_query
Args: {'query': 'SELECT g.Name AS Genre, AVG(t.Milliseconds) AS AvgTrackLength FROM Track t JOIN Genre g ON t.GenreId = g.GenreId GROUP BY g.Name ORDER BY AvgTrackLength DESC LIMIT 1;'}

我们可以使用 Command 恢复执行，在这种情况下，接受查询

from langgraph.types import Command 

for step in agent.stream(
    Command(resume={"decisions": [{"type": "approve"}]}), 
    config,
    stream_mode="values",
):
    if "messages" in step:
        step["messages"][-1].pretty_print()
    elif "__interrupt__" in step:
        print("INTERRUPTED:")
        interrupt = step["__interrupt__"][0]
        for request in interrupt.value["action_requests"]:
            print(request["description"])
    else:
        pass

================================== Ai Message ==================================
Tool Calls:
  sql_db_query (call_7oz86Epg7lYRqi9rQHbZPS1U)
 Call ID: call_7oz86Epg7lYRqi9rQHbZPS1U
  Args:
    query: SELECT Genre.Name, AVG(Track.Milliseconds) AS AvgDuration FROM Track JOIN Genre ON Track.GenreId = Genre.GenreId GROUP BY Genre.Name ORDER BY AvgDuration DESC LIMIT 5;
================================= Tool Message =================================
Name: sql_db_query

[('Sci Fi & Fantasy', 2911783.0384615385), ('Science Fiction', 2625549.076923077), ('Drama', 2575283.78125), ('TV Shows', 2145041.0215053763), ('Comedy', 1585263.705882353)]
================================== Ai Message ==================================

The genre with the longest average track length is "Sci Fi & Fantasy" with an average duration of about 2,911,783 milliseconds, followed by "Science Fiction" and "Drama."

详情请参阅人机协作指南。

4. 执行 SQL 查询

在运行命令之前，请检查 _safe_sql 中 LLM 生成的命令

const DENY_RE = /\b(INSERT|UPDATE|DELETE|ALTER|DROP|CREATE|REPLACE|TRUNCATE)\b/i;
const HAS_LIMIT_TAIL_RE = /\blimit\b\s+\d+(\s*,\s*\d+)?\s*;?\s*$/i;

function sanitizeSqlQuery(q) {
  let query = String(q ?? "").trim();

  // block multiple statements (allow one optional trailing ;)
  const semis = [...query].filter((c) => c === ";").length;
  if (semis > 1 || (query.endsWith(";") && query.slice(0, -1).includes(";"))) {
    throw new Error("multiple statements are not allowed.")
  }
  query = query.replace(/;+\s*$/g, "").trim();

  // read-only gate
  if (!query.toLowerCase().startsWith("select")) {
    throw new Error("Only SELECT statements are allowed")
  }
  if (DENY_RE.test(query)) {
    throw new Error("DML/DDL detected. Only read-only queries are permitted.")
  }

  // append LIMIT only if not already present
  if (!HAS_LIMIT_TAIL_RE.test(query)) {
    query += " LIMIT 5";
  }
  return query;
}

然后，使用 SQLDatabase 中的 run 来执行带有 execute_sql 工具的命令

import { tool } from "langchain"
import * as z from "zod";

const executeSql = tool(
  async ({ query }) => {
    const q = sanitizeSqlQuery(query);
    try {
      const result = await db.run(q);
      return typeof result === "string" ? result : JSON.stringify(result, null, 2);
    } catch (e) {
      throw new Error(e?.message ?? String(e))
    }
  },
  {
    name: "execute_sql",
    description: "Execute a READ-ONLY SQLite SELECT query and return results.",
    schema: z.object({
      query: z.string().describe("SQLite SELECT query to execute (read-only)."),
    }),
  }
);

5. 使用 `createAgent`

使用 createAgent 以最少的代码构建一个 ReAct 智能体。智能体将解释请求并生成 SQL 命令。工具将检查命令的安全性，然后尝试执行命令。如果命令有错误，错误消息将返回给模型。然后模型可以检查原始请求和新的错误消息，并生成新的命令。这可以持续到 LLM 成功生成命令或达到结束计数。这种向模型提供反馈（在这种情况下是错误消息）的模式非常强大。使用描述性系统提示初始化智能体以自定义其行为：

import { SystemMessage } from "langchain";

const getSystemPrompt = async () => new SystemMessage(`You are a careful SQLite analyst.

Authoritative schema (do not invent columns/tables):
${await getSchema()}

Rules:
- Think step-by-step.
- When you need data, call the tool \`execute_sql\` with ONE SELECT query.
- Read-only only; no INSERT/UPDATE/DELETE/ALTER/DROP/CREATE/REPLACE/TRUNCATE.
- Limit to 5 rows unless user explicitly asks otherwise.
- If the tool returns 'Error:', revise the SQL and try again.
- Limit the number of attempts to 5.
- If you are not successful after 5 attempts, return a note to the user.
- Prefer explicit column lists; avoid SELECT *.
`);

现在，使用模型、工具和提示创建一个智能体

import { createAgent } from "langchain";

const agent = createAgent({
  model: "gpt-5",
  tools: [executeSql],
  systemPrompt: getSystemPrompt,
});

6. 运行智能体

对示例查询运行智能体并观察其行为

const question = "Which genre, on average, has the longest tracks?";
const stream = await agent.stream(
  { messages: [{ role: "user", content: question }] },
  { streamMode: "values" }
);
for await (const step of stream) {
  const message = step.messages.at(-1);
  console.log(`${message.role}: ${JSON.stringify(message.content, null, 2)}`);
}

human: Which genre, on average, has the longest tracks?
ai:
tool: [{"Genre":"Sci Fi & Fantasy","AvgMilliseconds":2911783.0384615385}]
ai: Sci Fi & Fantasy — average track length ≈ 48.5 minutes (about 2,911,783 ms).

智能体正确地编写了查询，检查了查询，并运行它以提供最终响应。

您可以在 LangSmith 跟踪中检查上述运行的所有方面，包括所采取的步骤、调用的工具、LLM 看到的提示等等。

(可选) 使用 Studio

Studio 提供“客户端”循环和内存，因此您可以将其作为聊天界面运行并查询数据库。您可以提出“告诉我数据库的结构”或“显示排名前 5 的客户的发票”等问题。您将看到生成的 SQL 命令和结果输出。有关如何启动的详细信息如下。

在 Studio 中运行您的智能体

除了前面提到的软件包，您还需要

npm i -g langgraph-cli@latest

在您将运行的目录中，您需要一个包含以下内容的 langgraph.json 文件

{
  "dependencies": ["."],
  "graphs": {
      "agent": "./sqlAgent.ts:agent",
      "graph": "./sqlAgentLanggraph.ts:graph"
  },
  "env": ".env"
}

import fs from "node:fs/promises";
import path from "node:path";
import { SqlDatabase } from "@langchain/classic/sql_db";
import { DataSource } from "typeorm";
import { SystemMessage, createAgent, tool } from "langchain"
import * as z from "zod";

const url = "https://storage.googleapis.com/benchmarks-artifacts/chinook/Chinook.db";
const localPath = path.resolve("Chinook.db");

async function resolveDbPath() {
  if (await fs.exists(localPath)) {
    return localPath;
  }
  const resp = await fetch(url);
  if (!resp.ok) throw new Error(`Failed to download DB. Status code: ${resp.status}`);
  const buf = Buffer.from(await resp.arrayBuffer());
  await fs.writeFile(localPath, buf);
  return localPath;
}

let db: SqlDatabase | undefined;
async function getDb() {
  if (!db) {
    const dbPath = await resolveDbPath();
    const datasource = new DataSource({ type: "sqlite", database: dbPath });
    db = await SqlDatabase.fromDataSourceParams({ appDataSource: datasource });
  }
  return db;
}

async function getSchema() {
  const db = await getDb();
  return await db.getTableInfo();
}

const DENY_RE = /\b(INSERT|UPDATE|DELETE|ALTER|DROP|CREATE|REPLACE|TRUNCATE)\b/i;
const HAS_LIMIT_TAIL_RE = /\blimit\b\s+\d+(\s*,\s*\d+)?\s*;?\s*$/i;

function sanitizeSqlQuery(q) {
  let query = String(q ?? "").trim();

  // block multiple statements (allow one optional trailing ;)
  const semis = [...query].filter((c) => c === ";").length;
  if (semis > 1 || (query.endsWith(";") && query.slice(0, -1).includes(";"))) {
    throw new Error("multiple statements are not allowed.")
  }
  query = query.replace(/;+\s*$/g, "").trim();

  // read-only gate
  if (!query.toLowerCase().startsWith("select")) {
    throw new Error("Only SELECT statements are allowed")
  }
  if (DENY_RE.test(query)) {
    throw new Error("DML/DDL detected. Only read-only queries are permitted.")
  }

  // append LIMIT only if not already present
  if (!HAS_LIMIT_TAIL_RE.test(query)) {
    query += " LIMIT 5";
  }
  return query;
}

const executeSql = tool(
  async ({ query }) => {
    const q = sanitizeSqlQuery(query);
    try {
      const result = await db.run(q);
      return typeof result === "string" ? result : JSON.stringify(result, null, 2);
    } catch (e) {
      throw new Error(e?.message ?? String(e))
    }
  },
  {
    name: "execute_sql",
    description: "Execute a READ-ONLY SQLite SELECT query and return results.",
    schema: z.object({
      query: z.string().describe("SQLite SELECT query to execute (read-only)."),
    }),
  }
);

const getSystemPrompt = async () => new SystemMessage(`You are a careful SQLite analyst.

Authoritative schema (do not invent columns/tables):
${await getSchema()}

Rules:
- Think step-by-step.
- When you need data, call the tool \`execute_sql\` with ONE SELECT query.
- Read-only only; no INSERT/UPDATE/DELETE/ALTER/DROP/CREATE/REPLACE/TRUNCATE.
- Limit to 5 rows unless user explicitly asks otherwise.
- If the tool returns 'Error:', revise the SQL and try again.
- Limit the number of attempts to 5.
- If you are not successful after 5 attempts, return a note to the user.
- Prefer explicit column lists; avoid SELECT *.
`);

export const agent = createAgent({
  model: "gpt-5",
  tools: [executeSql],
  systemPrompt: getSystemPrompt,
});

后续步骤

如需更深度的自定义，请查看本教程，了解如何直接使用 LangGraph 原语实现 SQL 智能体。

在 GitHub 上编辑此页面源文件。

以编程方式连接这些文档到 Claude、VSCode 等，通过 MCP 获取实时答案。

教程

概念概述

LangChain 学院

附加资源

概览

概念

设置

安装

LangSmith

1. 选择一个大型语言模型 (LLM)

2. 配置数据库

3. 添加用于数据库交互的工具

6. 实现人工审核（human-in-the-loop review）

4. 执行 SQL 查询

5. 使用 `createAgent`

6. 运行智能体

(可选) 使用 Studio

后续步骤

教程

概念概述

LangChain 学院

附加资源

​概览

​概念

​设置

​安装

​LangSmith

​1. 选择一个大型语言模型 (LLM)

​2. 配置数据库

​3. 添加用于数据库交互的工具

​6. 实现人工审核（human-in-the-loop review）

​4. 执行 SQL 查询

​5. 使用 createAgent

​6. 运行智能体

​(可选) 使用 Studio

​后续步骤

概览

概念

设置

安装

LangSmith

1. 选择一个大型语言模型 (LLM)

2. 配置数据库

3. 添加用于数据库交互的工具

6. 实现人工审核（human-in-the-loop review）

4. 执行 SQL 查询

5. 使用 `createAgent`

6. 运行智能体

(可选) 使用 Studio

后续步骤