模型 - LangChain 文档

大型语言模型 (LLM) 是强大的人工智能工具，能够像人类一样解释和生成文本。它们用途广泛，足以编写内容、翻译语言、总结和回答问题，而无需为每项任务进行专门训练。除了文本生成，许多模型还支持：

工具调用 - 调用外部工具（如数据库查询或 API 调用）并在其响应中使用结果。
结构化输出 - 模型的响应被限制为遵循定义的格式。
多模态 - 处理和返回文本以外的数据，例如图像、音频和视频。
推理 - 模型执行多步推理以得出结论。

模型是代理的推理引擎。它们驱动代理的决策过程，决定调用哪些工具，如何解释结果，以及何时提供最终答案。您选择的模型的质量和能力直接影响代理的可靠性和性能。不同的模型擅长不同的任务——有些更擅长遵循复杂指令，有些更擅长结构化推理，有些支持更大的上下文窗口来处理更多信息。 LangChain 的标准模型接口让您可以访问许多不同的提供商集成，这使得试验和切换模型以找到最适合您的情况变得容易。

有关特定提供商的集成信息和功能，请参阅提供商的聊天模型页面。

基本用法

模型可以通过两种方式使用

与代理一起使用 - 创建代理时可以动态指定模型。
独立使用 - 模型可以直接调用（在代理循环之外），用于文本生成、分类或提取等任务，而无需代理框架。

相同的模型接口在这两种上下文中都适用，这让您可以灵活地从简单开始，并根据需要扩展到更复杂的基于代理的工作流程。

初始化模型

在 LangChain 中使用独立模型最简单的方法是使用 initChatModel 从您选择的提供商初始化一个模型（示例如下）

OpenAI
Anthropic
Azure
Google Gemini
Bedrock Converse

👉 阅读OpenAI 聊天模型集成文档

npm install @langchain/openai

import { initChatModel } from "langchain";

process.env.OPENAI_API_KEY = "your-api-key";

const model = await initChatModel("gpt-4.1");

const response = await model.invoke("Why do parrots talk?");

有关更多详细信息，包括如何传递模型参数的信息，请参阅 initChatModel。

关键方法

调用

模型将消息作为输入，并在生成完整响应后输出消息。

流式传输

调用模型，但实时流式传输生成的输出。

批量

以批处理方式向模型发送多个请求，以实现更高效的处理。

除了聊天模型，LangChain 还支持其他相关技术，例如嵌入模型和向量存储。有关详细信息，请参阅集成页面。

参数

聊天模型接受可用于配置其行为的参数。支持的完整参数集因模型和提供商而异，但标准参数包括

model

字符串

必填

与提供商一起使用的特定模型的名称或标识符。

API 密钥

字符串

与模型提供商进行身份验证所需的密钥。这通常在您注册访问模型时颁发。通常通过设置.

temperature

数字

控制模型输出的随机性。数字越高，响应越有创意；数字越低，响应越确定。

超时

数字

在取消请求之前，等待模型响应的最长时间（以秒为单位）。

最大令牌数

数字

限制响应中的总，从而有效控制输出的长度。

最大重试次数

数字

如果请求因网络超时或速率限制等问题而失败，系统将尝试重新发送请求的最大次数。

使用 initChatModel，将这些参数作为内联参数传递

使用模型参数初始化

const model = await initChatModel(
    "claude-sonnet-4-5-20250929",
    { temperature: 0.7, timeout: 30, max_tokens: 1000 }
)

每个聊天模型集成可能具有用于控制特定于提供商功能的附加参数。例如，@[ChatOpenAI] 具有 use_responses_api 来指示是否使用 OpenAI Responses 或 Completions API。要查找给定聊天模型支持的所有参数，请转到聊天模型集成页面。

调用

必须调用聊天模型才能生成输出。有三种主要的调用方法，每种都适用于不同的用例。

调用

调用模型最直接的方法是使用 invoke() 和单个消息或消息列表。

单条消息

const response = await model.invoke("Why do parrots have colorful feathers?");
console.log(response);

可以将消息列表提供给模型以表示对话历史记录。每条消息都有一个角色，模型使用该角色来指示对话中谁发送了消息。有关角色、类型和内容的更多详细信息，请参阅消息指南。

对象格式

const conversation = [
  { role: "system", content: "You are a helpful assistant that translates English to French." },
  { role: "user", content: "Translate: I love programming." },
  { role: "assistant", content: "J'adore la programmation." },
  { role: "user", content: "Translate: I love building applications." },
];

const response = await model.invoke(conversation);
console.log(response);  // AIMessage("J'adore créer des applications.")

消息对象

import { HumanMessage, AIMessage, SystemMessage } from "langchain";

const conversation = [
  new SystemMessage("You are a helpful assistant that translates English to French."),
  new HumanMessage("Translate: I love programming."),
  new AIMessage("J'adore la programmation."),
  new HumanMessage("Translate: I love building applications."),
];

const response = await model.invoke(conversation);
console.log(response);  // AIMessage("J'adore créer des applications.")

流式处理

大多数模型可以在生成输出内容时进行流式传输。通过逐步显示输出，流式传输显著改善了用户体验，特别是对于较长的响应。调用 stream() 返回一个，它会在生成输出块时生成这些块。您可以使用循环实时处理每个块：

const stream = await model.stream("Why do parrots have colorful feathers?");
for await (const chunk of stream) {
  console.log(chunk.text)
}

与 invoke() 不同，后者在模型完成生成完整响应后返回单个 AIMessage，stream() 返回多个 AIMessageChunk 对象，每个对象都包含部分输出文本。重要的是，流中的每个块都旨在通过求和累积成一条完整的消息

构建 AIMessage

let full: AIMessageChunk | null = null;
for await (const chunk of stream) {
  full = full ? full.concat(chunk) : chunk;
  console.log(full.text);
}

// The
// The sky
// The sky is
// The sky is typically
// The sky is typically blue
// ...

console.log(full.contentBlocks);
// [{"type": "text", "text": "The sky is typically blue..."}]

生成的消息可以像通过 invoke() 生成的消息一样处理——例如，它可以聚合到消息历史记录中，并作为对话上下文传递回模型。

只有当程序中的所有步骤都知道如何处理块流时，流式传输才有效。例如，一个非流式传输应用程序是需要将整个输出存储在内存中才能进行处理的应用程序。

高级流式传输主题

“自动流式传输”聊天模型

LangChain 在某些情况下通过自动启用流式传输模式来简化聊天模型的流式传输，即使您没有明确调用流式传输方法。当您使用非流式传输调用方法但仍希望流式传输整个应用程序（包括来自聊天模型的中间结果）时，这尤其有用。例如，在LangGraph 代理中，您可以在节点内调用 model.invoke()，但如果在流式传输模式下运行，LangChain 将自动委托给流式传输。

工作原理

当您 invoke() 聊天模型时，如果 LangChain 检测到您正在尝试流式传输整个应用程序，它将自动切换到内部流式传输模式。对于使用 invoke 的代码而言，调用的结果将是相同的；但是，在流式传输聊天模型时，LangChain 将负责在 LangChain 的回调系统中调用 @[on_llm_new_token] 事件。回调事件允许 LangGraph stream() 和 streamEvents() 实时显示聊天模型的输出。

流式事件

LangChain 聊天模型还可以使用 [streamEvents()][BaseChatModel.streamEvents] 流式传输语义事件。这简化了基于事件类型和其他元数据的过滤，并将在后台聚合完整消息。请参见下面的示例。

const stream = await model.streamEvents("Hello");
for await (const event of stream) {
    if (event.event === "on_chat_model_start") {
        console.log(`Input: ${event.data.input}`);
    }
    if (event.event === "on_chat_model_stream") {
        console.log(`Token: ${event.data.chunk.text}`);
    }
    if (event.event === "on_chat_model_end") {
        console.log(`Full message: ${event.data.output.text}`);
    }
}

Input: Hello
Token: Hi
Token:  there
Token: !
Token:  How
Token:  can
Token:  I
...
Full message: Hi there! How can I help today?

有关事件类型和其他详细信息，请参阅 streamEvents() 参考。

批量处理

批量处理对模型的独立请求集合可以显著提高性能并降低成本，因为处理可以并行完成

批量处理

const responses = await model.batch([
  "Why do parrots have colorful feathers?",
  "How do airplanes fly?",
  "What is quantum computing?",
  "Why do parrots have colorful feathers?",
  "How do airplanes fly?",
  "What is quantum computing?",
]);
for (const response of responses) {
  console.log(response);
}

使用 batch() 处理大量输入时，您可能需要控制最大并行调用数。这可以通过在 RunnableConfig 字典中设置 maxConcurrency 属性来完成。

最大并发批量处理

model.batch(
  listOfInputs,
  {
    maxConcurrency: 5,  // Limit to 5 parallel calls
  }
)

有关支持属性的完整列表，请参阅 RunnableConfig 参考。

有关批量处理的更多详细信息，请参阅参考。

工具调用

模型可以请求调用执行任务的工具，例如从数据库中获取数据、搜索网络或运行代码。工具是

一个模式，包括工具名称、描述和/或参数定义（通常是 JSON 模式）
要执行的函数或。

您可能会听到“函数调用”这个术语。我们将其与“工具调用”互换使用。

要使您定义的工具可供模型使用，您必须使用 bindTools() 绑定它们。在后续调用中，模型可以根据需要选择调用任何绑定的工具。一些模型提供商提供内置工具，可以通过模型或调用参数启用（例如 ChatOpenAI、ChatAnthropic）。有关详细信息，请查看各自的提供商参考。

有关创建工具的详细信息和其他选项，请参阅工具指南。

绑定用户工具

import { tool } from "langchain";
import * as z from "zod";
import { ChatOpenAI } from "@langchain/openai";

const getWeather = tool(
  (input) => `It's sunny in ${input.location}.`,
  {
    name: "get_weather",
    description: "Get the weather at a location.",
    schema: z.object({
      location: z.string().describe("The location to get the weather for"),
    }),
  },
);

const model = new ChatOpenAI({ model: "gpt-4o" });
const modelWithTools = model.bindTools([getWeather]);  

const response = await modelWithTools.invoke("What's the weather like in Boston?");
const toolCalls = response.tool_calls || [];
for (const tool_call of toolCalls) {
  // View tool calls made by the model
  console.log(`Tool: ${tool_call.name}`);
  console.log(`Args: ${tool_call.args}`);
}

绑定用户定义的工具时，模型的响应包含执行工具的请求。当模型与代理分开使用时，由您来执行请求的操作并将结果返回给模型以供后续推理使用。请注意，当使用代理时，代理循环将为您处理工具执行循环。下面，我们展示了一些常见的工具调用方式。

工具执行循环

当模型返回工具调用时，您需要执行工具并将结果传回模型。这会创建一个对话循环，模型可以使用工具结果生成其最终响应。LangChain 包含代理抽象，可以为您处理这种编排。这是一个简单的示例：

工具执行循环

// Bind (potentially multiple) tools to the model
const modelWithTools = model.bindTools([get_weather])

// Step 1: Model generates tool calls
const messages = [{"role": "user", "content": "What's the weather in Boston?"}]
const ai_msg = await modelWithTools.invoke(messages)
messages.push(ai_msg)

// Step 2: Execute tools and collect results
for (const tool_call of ai_msg.tool_calls) {
    // Execute the tool with the generated arguments
    const tool_result = await get_weather.invoke(tool_call)
    messages.push(tool_result)
}

// Step 3: Pass results back to model for final response
const final_response = await modelWithTools.invoke(messages)
console.log(final_response.text)
// "The current weather in Boston is 72°F and sunny."

工具返回的每个 @[ToolMessage] 都包含一个与原始工具调用匹配的 tool_call_id，帮助模型将结果与请求关联起来。

强制工具调用

默认情况下，模型可以根据用户的输入自由选择使用哪个绑定工具。但是，您可能希望强制选择一个工具，确保模型使用特定的工具或给定列表中的任何工具

const modelWithTools = model.bindTools([tool_1], { toolChoice: "any" })

并行工具调用

许多模型在适当情况下支持并行调用多个工具。这允许模型同时从不同来源收集信息。

并行工具调用

const modelWithTools = model.bind_tools([get_weather])

const response = await modelWithTools.invoke(
    "What's the weather in Boston and Tokyo?"
)


// The model may generate multiple tool calls
console.log(response.tool_calls)
// [
//   { name: 'get_weather', args: { location: 'Boston' }, id: 'call_1' },
//   { name: 'get_time', args: { location: 'Tokyo' }, id: 'call_2' }
// ]


// Execute all tools (can be done in parallel with async)
const results = []
for (const tool_call of response.tool_calls || []) {
    if (tool_call.name === 'get_weather') {
        const result = await get_weather.invoke(tool_call)
        results.push(result)
    }
}

模型根据所请求操作的独立性智能地确定何时适合并行执行。

大多数支持工具调用的模型默认启用并行工具调用。有些（包括 OpenAI 和 Anthropic）允许您禁用此功能。为此，请设置 parallel_tool_calls=False

model.bind_tools([get_weather], parallel_tool_calls=False)

流式工具调用

流式传输响应时，工具调用通过 @[ToolCallChunk] 逐步构建。这允许您在工具调用生成时立即看到它们，而无需等待完整的响应。

流式工具调用

const stream = await modelWithTools.stream(
    "What's the weather in Boston and Tokyo?"
)
for await (const chunk of stream) {
    // Tool call chunks arrive progressively
    if (chunk.tool_call_chunks) {
        for (const tool_chunk of chunk.tool_call_chunks) {
        console.log(`Tool: ${tool_chunk.get('name', '')}`)
        console.log(`Args: ${tool_chunk.get('args', '')}`)
        }
    }
}

// Output:
// Tool: get_weather
// Args:
// Tool:
// Args: {"loc
// Tool:
// Args: ation": "BOS"}
// Tool: get_time
// Args:
// Tool:
// Args: {"timezone": "Tokyo"}

您可以累积块以构建完整的工具调用

累积工具调用

let full: AIMessageChunk | null = null
const stream = await modelWithTools.stream("What's the weather in Boston?")
for await (const chunk of stream) {
    full = full ? full.concat(chunk) : chunk
    console.log(full.contentBlocks)
}

结构化输出

可以请求模型以与给定模式匹配的格式提供其响应。这对于确保输出可以轻松解析并在后续处理中使用非常有用。LangChain 支持多种模式类型和用于强制结构化输出的方法。

Zod
JSON Schema

Zod 模式是定义输出模式的首选方法。请注意，当提供 Zod 模式时，模型输出也将使用 Zod 的解析方法针对该模式进行验证。

import * as z from "zod";

const Movie = z.object({
  title: z.string().describe("The title of the movie"),
  year: z.number().describe("The year the movie was released"),
  director: z.string().describe("The director of the movie"),
  rating: z.number().describe("The movie's rating out of 10"),
});

const modelWithStructure = model.withStructuredOutput(Movie);

const response = await modelWithStructure.invoke("Provide details about the movie Inception");
console.log(response);
// {
//   title: "Inception",
//   year: 2010,
//   director: "Christopher Nolan",
//   rating: 8.8,
// }

结构化输出的关键考虑因素

方法参数：一些提供商支持不同的方法（'jsonSchema', 'functionCalling', 'jsonMode'）
包含原始数据：使用 @[includeRaw: true][BaseChatModel.with_structured_output(include_raw)] 获取解析的输出和原始 AIMessage
验证：Zod 模型提供自动验证，而 JSON Schema 需要手动验证

示例：消息输出与解析结构并行

将原始 AIMessage 对象与解析表示一起返回可能很有用，以便访问响应元数据，例如令牌计数。为此，在调用 @[with_structured_output][BaseChatModel.with_structured_output] 时设置 @[include_raw=True][BaseChatModel.with_structured_output(include_raw)]

import * as z from "zod";

const Movie = z.object({
  title: z.string().describe("The title of the movie"),
  year: z.number().describe("The year the movie was released"),
  director: z.string().describe("The director of the movie"),
  rating: z.number().describe("The movie's rating out of 10"),
  title: z.string().describe("The title of the movie"),
  year: z.number().describe("The year the movie was released"),
  director: z.string().describe("The director of the movie"),  
  rating: z.number().describe("The movie's rating out of 10"),
});

const modelWithStructure = model.withStructuredOutput(Movie, { includeRaw: true });

const response = await modelWithStructure.invoke("Provide details about the movie Inception");
console.log(response);
// {
//   raw: AIMessage { ... },
//   parsed: { title: "Inception", ... }
// }

示例：嵌套结构

模式可以嵌套

import * as z from "zod";

const Actor = z.object({
  name: str
  role: z.string(),
});

const MovieDetails = z.object({
  title: z.string(),
  year: z.number(),
  cast: z.array(Actor),
  genres: z.array(z.string()),
  budget: z.number().nullable().describe("Budget in millions USD"),
});

const modelWithStructure = model.withStructuredOutput(MovieDetails);

支持的模型

LangChain 支持所有主要模型提供商，包括 OpenAI、Anthropic、Google、Azure、AWS Bedrock 等。每个提供商都提供各种功能不同的模型。有关 LangChain 中支持的模型的完整列表，请参阅集成页面。

高级主题

多模态

某些模型可以处理并返回非文本数据，例如图像、音频和视频。您可以通过提供内容块将非文本数据传递给模型。

所有具有基础多模态功能的 LangChain 聊天模型都支持

跨提供商标准格式的数据（请参阅我们的消息指南）
OpenAI 聊天补全格式
任何特定提供商的本机格式（例如，Anthropic 模型接受 Anthropic 本机格式）

有关详细信息，请参阅消息指南的多模态部分。可以将多模态数据作为其响应的一部分返回。如果被调用执行此操作，则生成的 AIMessage 将具有多模态类型的内容块。

多模态输出

const response = await model.invoke("Create a picture of a cat");
console.log(response.contentBlocks);
// [
//   { type: "text", text: "Here's a picture of a cat" },
//   { type: "image", data: "...", mimeType: "image/jpeg" },
// ]

有关特定提供商的详细信息，请参阅集成页面。

推理

较新的模型能够执行多步推理以得出结论。这涉及将复杂问题分解为更小、更易于管理的步骤。 如果底层模型支持，您可以显示此推理过程，以更好地理解模型如何得出最终答案。

const stream = model.stream("Why do parrots have colorful feathers?");
for await (const chunk of stream) {
    const reasoningSteps = chunk.contentBlocks.filter(b => b.type === "reasoning");
    console.log(reasoningSteps.length > 0 ? reasoningSteps : chunk.text);
}

根据模型，您有时可以指定其在推理上应付出的努力程度。同样，您可以请求模型完全关闭推理。这可能采取推理的分类“层级”（例如，'low' 或 'high'）或整数令牌预算的形式。有关详细信息，请参阅集成页面或您的相应聊天模型的参考。

本地模型

LangChain 支持在您自己的硬件上本地运行模型。这对于数据隐私至关重要、您想调用自定义模型或您想避免使用基于云的模型所产生的成本的场景非常有用。 Ollama 是在本地运行模型最简单的方法之一。在集成页面上查看本地集成的完整列表。

提示缓存

许多提供商提供提示缓存功能，以减少重复处理相同令牌时的延迟和成本。这些功能可以是隐式或显式的

隐式提示缓存： 如果请求命中缓存，提供商将自动降低成本。示例：OpenAI 和 Gemini（Gemini 2.5 及更高版本）。
显式缓存： 提供商允许您手动指示缓存点，以实现更好的控制或保证成本节省。示例：@[ChatOpenAI]（通过 prompt_cache_key），Anthropic 的AnthropicPromptCachingMiddleware 和cache_control 选项，AWS Bedrock，Gemini。

提示缓存通常只在输入令牌达到最低阈值时才启用。有关详细信息，请参阅提供商页面。

缓存使用情况将反映在模型响应的使用元数据中。

服务器端工具使用

一些提供商支持服务器端工具调用循环：模型可以与网络搜索、代码解释器和其他工具进行交互，并在单个对话回合中分析结果。如果模型在服务器端调用工具，则响应消息的内容将包含表示工具调用和结果的内容。访问响应的内容块将以与提供商无关的格式返回服务器端工具调用和结果：

import { initChatModel } from "langchain";

const model = await initChatModel("gpt-4.1-mini");
const modelWithTools = model.bindTools([{ type: "web_search" }])

const message = await modelWithTools.invoke("What was a positive news story from today?");
console.log(message.contentBlocks);

这代表一个单独的对话回合；没有相关的工具消息对象需要像客户端工具调用中那样传入。有关可用工具和使用详细信息，请参阅您给定提供商的集成页面。

基础 URL 或代理

对于许多聊天模型集成，您可以配置 API 请求的基础 URL，这允许您使用具有 OpenAI 兼容 API 的模型提供商或使用代理服务器。

基础 URL

许多模型提供商提供 OpenAI 兼容的 API（例如，Together AI，vLLM）。您可以通过指定适当的 base_url 参数将 initChatModel 与这些提供商一起使用

model = initChatModel(
    "MODEL_NAME",
    {
        modelProvider: "openai",
        baseUrl: "BASE_URL",
        apiKey: "YOUR_API_KEY",
    }
)

在使用直接聊天模型类实例化时，参数名称可能因提供商而异。有关详细信息，请查看相应的参考。

日志概率

通过在初始化模型时设置 logprobs 参数，可以配置某些模型以返回表示给定令牌可能性的令牌级日志概率

const model = new ChatOpenAI({
    model: "gpt-4o",
    logprobs: true,
});

const responseMessage = await model.invoke("Why do parrots talk?");

responseMessage.response_metadata.logprobs.content.slice(0, 5);

Token 用量

许多模型提供商返回令牌使用信息作为调用响应的一部分。如果可用，此信息将包含在相应模型生成的 AIMessage 对象上。有关更多详细信息，请参阅消息指南。

一些提供商 API，特别是 OpenAI 和 Azure OpenAI 聊天补全，要求用户选择加入在流式传输上下文中接收令牌使用数据。有关详细信息，请参阅集成指南的流式传输使用元数据部分。

调用配置

调用模型时，您可以使用 RunnableConfig 对象通过 config 参数传递其他配置。这提供了对执行行为、回调和元数据跟踪的运行时控制。常见的配置选项包括：

配置调用

const response = await model.invoke(
    "Tell me a joke",
    {
        runName: "joke_generation",      // Custom name for this run
        tags: ["humor", "demo"],          // Tags for categorization
        metadata: {"user_id": "123"},     // Custom metadata
        callbacks: [my_callback_handler], // Callback handlers
    }
)

这些配置值在以下情况下特别有用：

使用 LangSmith 跟踪进行调试
实现自定义日志记录或监控
控制生产中的资源使用
跟踪复杂管道中的调用

关键配置属性

运行名称

字符串

在日志和跟踪中标识此特定调用。子调用不继承。

LangChain v1.0

入门

核心组件

高级用法

生产环境中使用

模型

基本用法

初始化模型

关键方法

调用

流式传输

批量

参数

调用

调用

流式处理

工作原理

批量处理

工具调用

结构化输出

支持的模型

高级主题

多模态

推理

本地模型

提示缓存

服务器端工具使用

基础 URL 或代理

日志概率

Token 用量

调用配置

LangChain v1.0

入门

核心组件

高级用法

生产环境中使用

​基本用法

​初始化模型

​关键方法

调用

流式传输

批量

​参数

​调用

​调用

​流式处理

​工作原理

​批量处理

​工具调用

​结构化输出

​支持的模型

​高级主题

​多模态

​推理

​本地模型

​提示缓存

​服务器端工具使用

​基础 URL 或代理

​日志概率

​Token 用量

​调用配置

基本用法

初始化模型

关键方法

参数

调用

调用

流式处理

工作原理

批量处理

工具调用

结构化输出

支持的模型

高级主题

多模态

推理

本地模型

提示缓存

服务器端工具使用

基础 URL 或代理

日志概率

Token 用量

调用配置