跳到主要内容
通过 Gemini API 直接访问 Google 的生成式 AI 模型(包括 Gemini 系列),或者使用 Google AI Studio 快速进行实验。langchain-google-genai 包提供了这些模型的 LangChain 集成。这通常是个人开发者的最佳起点。 有关最新模型、其功能、上下文窗口等信息,请访问 Google AI 文档。所有模型 ID 都可以在 Gemini API 文档 中找到。

集成详情

类别本地可序列化JS 支持下载量版本
ChatGoogleGenerativeAIlangchain-google-genai测试版PyPI - DownloadsPyPI - Version

模型功能

工具调用结构化输出JSON 模式图像输入音频输入视频输入令牌级流式传输原生异步Token 用量Logprobs

设置

要访问 Google AI 模型,您需要创建一个 Google 帐户,获取 Google AI API 密钥,并安装 langchain-google-genai 集成包。 1. 安装:
pip install -U langchain-google-genai
2. 凭据: 前往 https://ai.google.dev/gemini-api/docs/api-key(或通过 Google AI Studio)生成 Google AI API 密钥。

聊天模型

使用 ChatGoogleGenerativeAI 类与 Google 的聊天模型进行交互。有关完整详细信息,请参阅 API 参考
import getpass
import os

if "GOOGLE_API_KEY" not in os.environ:
    os.environ["GOOGLE_API_KEY"] = getpass.getpass("Enter your Google AI API key: ")
要启用模型调用的自动化跟踪,请设置您的 LangSmith API 密钥
os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")
os.environ["LANGSMITH_TRACING"] = "true"

实例化

现在我们可以实例化我们的模型对象并生成聊天完成
from langchain_google_genai import ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(
    model="gemini-2.5-flash",
    temperature=0,
    max_tokens=None,
    timeout=None,
    max_retries=2,
    # other params...
)

调用

messages = [
    (
        "system",
        "You are a helpful assistant that translates English to French. Translate the user sentence.",
    ),
    ("human", "I love programming."),
]
ai_msg = llm.invoke(messages)
ai_msg
AIMessage(content="J'adore la programmation.", additional_kwargs={}, response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'model_name': 'gemini-2.5-flash', 'safety_ratings': []}, id='run-3b28d4b8-8a62-4e6c-ad4e-b53e6e825749-0', usage_metadata={'input_tokens': 20, 'output_tokens': 7, 'total_tokens': 27, 'input_token_details': {'cache_read': 0}})
print(ai_msg.content)
J'adore la programmation.

多模态用法

Gemini 模型可以接受多模态输入(文本、图像、音频、视频),对于某些模型,还可以生成多模态输出。

图像输入

使用包含列表内容格式的 HumanMessage 提供图像输入以及文本。请确保使用支持图像输入的模型,例如 gemini-2.5-flash
import base64

from langchain.messages import HumanMessage
from langchain_google_genai import ChatGoogleGenerativeAI

# Example using a public URL (remains the same)
message_url = HumanMessage(
    content=[
        {
            "type": "text",
            "text": "Describe the image at the URL.",
        },
        {"type": "image_url", "image_url": "https://picsum.photos/seed/picsum/200/300"},
    ]
)
result_url = llm.invoke([message_url])
print(f"Response for URL image: {result_url.content}")

# Example using a local image file encoded in base64
image_file_path = "/Users/philschmid/projects/google-gemini/langchain/docs/static/img/agents_vs_chains.png"

with open(image_file_path, "rb") as image_file:
    encoded_image = base64.b64encode(image_file.read()).decode("utf-8")

message_local = HumanMessage(
    content=[
        {"type": "text", "text": "Describe the local image."},
        {"type": "image_url", "image_url": f"data:image/png;base64,{encoded_image}"},
    ]
)
result_local = llm.invoke([message_local])
print(f"Response for local image: {result_local.content}")
其他受支持的 image_url 格式
  • 一个 Google Cloud Storage URI(gs://...)。确保服务帐户具有访问权限。
  • 一个 PIL Image 对象(库处理编码)。

音频输入

提供音频文件输入以及文本。
import base64

from langchain.messages import HumanMessage

# Ensure you have an audio file named 'example_audio.mp3' or provide the correct path.
audio_file_path = "example_audio.mp3"
audio_mime_type = "audio/mpeg"


with open(audio_file_path, "rb") as audio_file:
    encoded_audio = base64.b64encode(audio_file.read()).decode("utf-8")

message = HumanMessage(
    content=[
        {"type": "text", "text": "Transcribe the audio."},
        {
            "type": "media",
            "data": encoded_audio,  # Use base64 string directly
            "mime_type": audio_mime_type,
        },
    ]
)
response = llm.invoke([message])  # Uncomment to run
print(f"Response for audio: {response.content}")

视频输入

提供视频文件输入以及文本。
import base64

from langchain.messages import HumanMessage
from langchain_google_genai import ChatGoogleGenerativeAI

# Ensure you have a video file named 'example_video.mp4' or provide the correct path.
video_file_path = "example_video.mp4"
video_mime_type = "video/mp4"


with open(video_file_path, "rb") as video_file:
    encoded_video = base64.b64encode(video_file.read()).decode("utf-8")

message = HumanMessage(
    content=[
        {"type": "text", "text": "Describe the first few frames of the video."},
        {
            "type": "media",
            "data": encoded_video,  # Use base64 string directly
            "mime_type": video_mime_type,
        },
    ]
)
response = llm.invoke([message])  # Uncomment to run
print(f"Response for video: {response.content}")

图像生成(多模态输出)

某些模型(例如 gemini-2.5-flash-image)可以内联生成文本和图像。您需要指定所需的 response_modalities。有关详细信息,请参阅 Gemini API 文档
import base64

from IPython.display import Image, display
from langchain.messages import AIMessage
from langchain_google_genai import ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(model="models/gemini-2.5-flash-image")

message = {
    "role": "user",
    "content": "Generate a photorealistic image of a cuddly cat wearing a hat.",
}

response = llm.invoke(
    [message],
    generation_config=dict(response_modalities=["TEXT", "IMAGE"]),
)


def _get_image_base64(response: AIMessage) -> None:
    image_block = next(
        block
        for block in response.content
        if isinstance(block, dict) and block.get("image_url")
    )
    return image_block["image_url"].get("url").split(",")[-1]


image_base64 = _get_image_base64(response)
display(Image(data=base64.b64decode(image_base64), width=300))

工具调用

您可以为模型配备工具来调用。
from langchain.tools import tool
from langchain_google_genai import ChatGoogleGenerativeAI


# Define the tool
@tool(description="Get the current weather in a given location")
def get_weather(location: str) -> str:
    return "It's sunny."


# Initialize the model and bind the tool
llm = ChatGoogleGenerativeAI(model="gemini-2.5-flash-lite")
llm_with_tools = llm.bind_tools([get_weather])

# Invoke the model with a query that should trigger the tool
query = "What's the weather in San Francisco?"
ai_msg = llm_with_tools.invoke(query)

# Check the tool calls in the response
print(ai_msg.tool_calls)

# Example tool call message would be needed here if you were actually running the tool
from langchain.messages import ToolMessage

tool_message = ToolMessage(
    content=get_weather(*ai_msg.tool_calls[0]["args"]),
    tool_call_id=ai_msg.tool_calls[0]["id"],
)
llm_with_tools.invoke([ai_msg, tool_message])  # Example of passing tool result back
[{'name': 'get_weather', 'args': {'location': 'San Francisco'}, 'id': 'a6248087-74c5-4b7c-9250-f335e642927c', 'type': 'tool_call'}]
AIMessage(content="OK. It's sunny in San Francisco.", additional_kwargs={}, response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'model_name': 'gemini-2.5-flash-lite', 'safety_ratings': []}, id='run-ac5bb52c-e244-4c72-9fbc-fb2a9cd7a72e-0', usage_metadata={'input_tokens': 29, 'output_tokens': 11, 'total_tokens': 40, 'input_token_details': {'cache_read': 0}})

结构化输出

使用 Pydantic 模型强制模型以特定结构响应。
from langchain_core.pydantic_v1 import BaseModel, Field
from langchain_google_genai import ChatGoogleGenerativeAI


# Define the desired structure
class Person(BaseModel):
    """Information about a person."""

    name: str = Field(..., description="The person's name")
    height_m: float = Field(..., description="The person's height in meters")


# Initialize the model
llm = ChatGoogleGenerativeAI(model="gemini-2.5-flash-lite", temperature=0)

# Method 1: Default function calling approach
structured_llm_default = llm.with_structured_output(Person)

# Method 2: Native JSON schema for better reliability (recommended)
structured_llm_json = llm.with_structured_output(Person, method="json_schema")

# Invoke the model with a query asking for structured information
result = structured_llm_json.invoke(
    "Who was the 16th president of the USA, and how tall was he in meters?"
)
print(result)
name='Abraham Lincoln' height_m=1.93

结构化输出方法

支持两种结构化输出方法
  • method="function_calling"(默认):使用工具调用来提取结构化数据。与所有 Gemini 模型兼容。
  • method="json_schema"method="json_mode":使用 Gemini 的原生结构化输出和 responseSchema。更可靠,但需要 Gemini 1.5+ 模型。(json_mode 旨在保持向后兼容性)
推荐使用 json_schema 方法以提高可靠性,因为它直接约束模型的生成过程,而不是依赖于后处理工具调用。

令牌使用跟踪

从响应元数据访问令牌使用信息。
from langchain_google_genai import ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(model="gemini-2.5-flash-lite")

result = llm.invoke("Explain the concept of prompt engineering in one sentence.")

print(result.content)
print("\nUsage Metadata:")
print(result.usage_metadata)
Prompt engineering is the art and science of crafting effective text prompts to elicit desired and accurate responses from large language models.

Usage Metadata:
{'input_tokens': 10, 'output_tokens': 24, 'total_tokens': 34, 'input_token_details': {'cache_read': 0}}

内置工具

Google Gemini 支持多种内置工具(Google 搜索代码执行),这些工具可以以通常的方式绑定到模型。
from google.ai.generativelanguage_v1beta.types import Tool as GenAITool

resp = llm.invoke(
    "When is the next total solar eclipse in US?",
    tools=[GenAITool(google_search={})],
)

print(resp.content)
The next total solar eclipse visible in the United States will occur on August 23, 2044. However, the path of totality will only pass through Montana, North Dakota, and South Dakota.

For a total solar eclipse that crosses a significant portion of the continental U.S., you'll have to wait until August 12, 2045. This eclipse will start in California and end in Florida.
from google.ai.generativelanguage_v1beta.types import Tool as GenAITool

resp = llm.invoke(
    "What is 2*2, use python",
    tools=[GenAITool(code_execution={})],
)

for c in resp.content:
    if isinstance(c, dict):
        if c["type"] == "code_execution_result":
            print(f"Code execution result: {c['code_execution_result']}")
        elif c["type"] == "executable_code":
            print(f"Executable code: {c['executable_code']}")
    else:
        print(c)
Executable code: print(2*2)

Code execution result: 4

2*2 is 4.
/Users/philschmid/projects/google-gemini/langchain/.venv/lib/python3.9/site-packages/langchain_google_genai/chat_models.py:580: UserWarning:
        ⚠️ Warning: Output may vary each run.
        - 'executable_code': Always present.
        - 'execution_result' & 'image_url': May be absent for some queries.

        Validate before using in production.

  warnings.warn(

原生异步

使用异步方法进行非阻塞调用。
from langchain_google_genai import ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(model="gemini-2.5-flash")


async def run_async_calls():
    # Async invoke
        result_ainvoke = await llm.ainvoke("Why is the sky blue?")
    print("Async Invoke Result:", result_ainvoke.content[:50] + "...")

    # Async stream
    print("\nAsync Stream Result:")
    async for chunk in llm.astream(
        "Write a short poem about asynchronous programming."
    ):
                print(chunk.content, end="", flush=True)
    print("\n")

    # Async batch
        results_abatch = await llm.abatch(["What is 1+1?", "What is 2+2?"])
    print("Async Batch Results:", [res.content for res in results_abatch])


await run_async_calls()
Async Invoke Result: The sky is blue due to a phenomenon called **Rayle...

Async Stream Result:
The thread is free, it does not wait,
For answers slow, or tasks of fate.
A promise made, a future bright,
It moves ahead, with all its might.

A callback waits, a signal sent,
When data's read, or job is spent.
Non-blocking code, a graceful dance,
Responsive apps, a fleeting glance.

Async Batch Results: ['1 + 1 = 2', '2 + 2 = 4']

安全设置

Gemini 模型具有可以覆盖的默认安全设置。如果您从模型中收到大量“安全警告”,您可以尝试调整模型的 safety_settings 属性。例如,要关闭对危险内容的安全阻止,您可以按如下方式构造 LLM:
from langchain_google_genai import (
    ChatGoogleGenerativeAI,
    HarmBlockThreshold,
    HarmCategory,
)

llm = ChatGoogleGenerativeAI(
        model="gemini-1.5-pro",
        safety_settings={
        HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: HarmBlockThreshold.BLOCK_NONE,
    },
)
有关可用类别和阈值的枚举,请参阅 Google 的 安全设置类型

API 参考

有关所有 ChatGoogleGenerativeAI 功能和配置的详细文档,请访问 API 参考
以编程方式连接这些文档到 Claude、VSCode 等,通过 MCP 获取实时答案。
© . This site is unofficial and not affiliated with LangChain, Inc.