优化分类器 - LangChain 文档

本教程将介绍如何根据用户反馈优化分类器。分类器非常适合优化，因为它通常很容易收集所需的输出，这使得根据用户反馈创建少量示例变得容易。这正是我们在这个例子中所做的。

目标

在这个例子中，我们将构建一个机器人，根据 GitHub 问题的标题对其进行分类。它将接收一个标题并将其分类到许多不同的类别之一。然后，我们将开始收集用户反馈，并使用它来调整分类器的性能。

入门

首先，我们将设置它，以便将所有跟踪发送到特定项目。我们可以通过设置环境变量来完成此操作

import os
os.environ["LANGSMITH_PROJECT"] = "classifier"

然后我们可以创建我们的初始应用程序。这将是一个非常简单的函数，它只接收 GitHub 问题的标题并尝试对其进行标记。

import openai
from langsmith import traceable, Client
import uuid

client = openai.Client()

available_topics = [
    "bug",
    "improvement",
    "new_feature",
    "documentation",
    "integration",
]

prompt_template = """Classify the type of the issue as one of {topics}.
Issue: {text}"""

@traceable(
    run_type="chain",
    name="Classifier",
)
def topic_classifier(
    topic: str):
    return client.chat.completions.create(
        model="gpt-4o-mini",
        temperature=0,
        messages=[
            {
                "role": "user",
                "content": prompt_template.format(
                    topics=','.join(available_topics),
                    text=topic,
                )
            }
        ],
    ).choices[0].message.content

然后我们可以开始与之交互。在与之交互时，我们将提前生成 LangSmith 运行 ID 并将其传递给此函数。我们这样做是为了以后可以附加反馈。以下是我们如何调用应用程序：

run_id = uuid.uuid4()
topic_classifier(
    "fix bug in LCEL",
    langsmith_extra={"run_id": run_id})

以下是我们如何事后附加反馈。我们可以收集两种形式的反馈。首先，我们可以收集“积极”反馈 - 这适用于模型正确理解的示例。

ls_client = Client()
run_id = uuid.uuid4()
topic_classifier(
    "fix bug in LCEL",
    langsmith_extra={"run_id": run_id})
ls_client.create_feedback(
    run_id,
    key="user-score",
    score=1.0,
)

接下来，我们可以专注于收集与生成的“更正”相对应的反馈。在此示例中，模型将其归类为错误，而我真的希望将其归类为文档。

ls_client = Client()
run_id = uuid.uuid4()
topic_classifier(
    "fix bug in documentation",
    langsmith_extra={"run_id": run_id})
ls_client.create_feedback(
    run_id,
    key="correction",
    correction="documentation")

设置自动化

我们现在可以设置自动化，将带有某种形式反馈的示例移动到数据集中。我们将设置两个自动化，一个用于积极反馈，另一个用于消极反馈。第一个将接收所有带有积极反馈的运行，并自动将其添加到数据集中。这背后的逻辑是，任何带有积极反馈的运行我们都可以在未来的迭代中用作良好示例。让我们创建一个名为 classifier-github-issues 的数据集来添加此数据。

第二个将接收所有带有更正的运行，并使用 webhook 将它们添加到数据集中。在创建此 webhook 时，我们将选择“使用更正”选项。此选项将使从运行创建数据集时，不使用运行的输出作为数据点的黄金标准输出，而是使用更正。

更新应用程序

我们现在可以更新我们的代码，以下载我们正在发送运行的数据集。下载后，我们可以创建一个包含示例的字符串。然后我们可以将此字符串作为提示的一部分！

### NEW CODE ###
# Initialize the LangSmith Client so we can use to get the dataset
ls_client = Client()

# Create a function that will take in a list of examples and format them into a string
def create_example_string(examples):
    final_strings = []
    for e in examples:
        final_strings.append(f"Input: {e.inputs['topic']}\n> {e.outputs['output']}")
    return "\n\n".join(final_strings)
### NEW CODE ###

client = openai.Client()

available_topics = [
    "bug",
    "improvement",
    "new_feature",
    "documentation",
    "integration",
]

prompt_template = """Classify the type of the issue as one of {topics}.

Here are some examples:
{examples}

Begin!
Issue: {text}
>"""

@traceable(
    run_type="chain",
    name="Classifier",
)
def topic_classifier(
    topic: str):
    # We can now pull down the examples from the dataset
    # We do this inside the function so it always get the most up-to-date examples,
    # But this can be done outside and cached for speed if desired
    examples = list(ls_client.list_examples(dataset_name="classifier-github-issues"))  # <- New Code
    example_string = create_example_string(examples)
    return client.chat.completions.create(
        model="gpt-4o-mini",
        temperature=0,
        messages=[
            {
                "role": "user",
                "content": prompt_template.format(
                    topics=','.join(available_topics),
                    text=topic,
                    examples=example_string,
                )
            }
        ],
    ).choices[0].message.content

如果现在使用与之前类似的输入运行应用程序，我们可以看到它正确地学习到任何与文档相关的内容（即使是错误）都应归类为 documentation

ls_client = Client()
run_id = uuid.uuid4()
topic_classifier(
    "address bug in documentation",
    langsmith_extra={"run_id": run_id})

示例的语义搜索

我们可以做的另一件事是只使用语义最相似的示例。当您开始积累大量示例时，这很有用。为此，我们首先可以定义一个示例来查找 k 个最相似的示例：

import numpy as np

def find_similar(examples, topic, k=5):
    inputs = [e.inputs['topic'] for e in examples] + [topic]
    vectors = client.embeddings.create(input=inputs, model="text-embedding-3-small")
    vectors = [e.embedding for e in vectors.data]
    vectors = np.array(vectors)
    args = np.argsort(-vectors.dot(vectors[-1])[:-1])[:5]
    examples = [examples[i] for i in args]
    return examples

然后我们可以在应用程序中使用它

ls_client = Client()

def create_example_string(examples):
    final_strings = []
    for e in examples:
        final_strings.append(f"Input: {e.inputs['topic']}\n> {e.outputs['output']}")
    return "\n\n".join(final_strings)

client = openai.Client()

available_topics = [
    "bug",
    "improvement",
    "new_feature",
    "documentation",
    "integration",
]

prompt_template = """Classify the type of the issue as one of {topics}.

Here are some examples:
{examples}

Begin!
Issue: {text}
>"""

@traceable(
    run_type="chain",
    name="Classifier",
)
def topic_classifier(
    topic: str):
    examples = list(ls_client.list_examples(dataset_name="classifier-github-issues"))
    examples = find_similar(examples, topic)
    example_string = create_example_string(examples)
    return client.chat.completions.create(
        model="gpt-4o-mini",
        temperature=0,
        messages=[
            {
                "role": "user",
                "content": prompt_template.format(
                    topics=','.join(available_topics),
                    text=topic,
                    examples=example_string,
                )
            }
        ],
    ).choices[0].message.content

在 GitHub 上编辑此页面源文件。

以编程方式连接这些文档到 Claude、VSCode 等，通过 MCP 获取实时答案。

创建和更新提示

教程

​目标

​入门

​设置自动化

​更新应用程序

​示例的语义搜索

目标

入门

设置自动化

更新应用程序

示例的语义搜索