跳到主要内容

概览

这将帮助您开始使用 Watsonx 文档压缩器。有关所有 Watsonx 文档压缩器功能和配置的详细文档,请参阅API 参考

集成详情

设置

要访问 IBM WatsonxAI 模型,您需要创建一个 IBM watsonx.ai 账户,获取 API 密钥或任何其他类型的凭证,并安装 @langchain/community 集成包。

凭据

前往 IBM Cloud 注册 IBM watsonx.ai 并生成 API 密钥或提供如下所示的任何其他身份验证形式。

IAM 身份验证

export WATSONX_AI_AUTH_TYPE=iam
export WATSONX_AI_APIKEY=<YOUR-APIKEY>

Bearer 令牌身份验证

export WATSONX_AI_AUTH_TYPE=bearertoken
export WATSONX_AI_BEARER_TOKEN=<YOUR-BEARER-TOKEN>

IBM watsonx.ai 软件身份验证

export WATSONX_AI_AUTH_TYPE=cp4d
export WATSONX_AI_USERNAME=<YOUR_USERNAME>
export WATSONX_AI_PASSWORD=<YOUR_PASSWORD>
export WATSONX_AI_URL=<URL>
一旦这些被放置在您的环境变量中并初始化对象,认证将自动进行。 也可以通过将这些值作为参数传递给新实例来完成认证。

IAM 身份验证

import { WatsonxLLM } from "@langchain/community/llms/ibm";

const props = {
  version: "YYYY-MM-DD",
  serviceUrl: "<SERVICE_URL>",
  projectId: "<PROJECT_ID>",
  watsonxAIAuthType: "iam",
  watsonxAIApikey: "<YOUR-APIKEY>",
};
const instance = new WatsonxLLM(props);

Bearer 令牌身份验证

import { WatsonxLLM } from "@langchain/community/llms/ibm";

const props = {
  version: "YYYY-MM-DD",
  serviceUrl: "<SERVICE_URL>",
  projectId: "<PROJECT_ID>",
  watsonxAIAuthType: "bearertoken",
  watsonxAIBearerToken: "<YOUR-BEARERTOKEN>",
};
const instance = new WatsonxLLM(props);

IBM watsonx.ai 软件身份验证

import { WatsonxLLM } from "@langchain/community/llms/ibm";

const props = {
  version: "YYYY-MM-DD",
  serviceUrl: "<SERVICE_URL>",
  projectId: "<PROJECT_ID>",
  watsonxAIAuthType: "cp4d",
  watsonxAIUsername: "<YOUR-USERNAME>",
  watsonxAIPassword: "<YOUR-PASSWORD>",
  watsonxAIUrl: "<url>",
};
const instance = new WatsonxLLM(props);
如果您想从单个查询中获得自动化跟踪,您还可以通过取消注释下方来设置您的 LangSmith API 密钥
// process.env.LANGSMITH_API_KEY = "<YOUR API KEY HERE>";
// process.env.LANGSMITH_TRACING = "true";

安装

此文档压缩器位于 @langchain/community 包中
npm install @langchain/community @langchain/core

实例化

现在我们可以实例化我们的压缩器
import { WatsonxRerank } from "@langchain/community/document_compressors/ibm";

const watsonxRerank = new WatsonxRerank({
  version: "2024-05-31",
  serviceUrl: process.env.WATSONX_AI_SERVICE_URL,
  projectId: process.env.WATSONX_AI_PROJECT_ID,
  model: "cross-encoder/ms-marco-minilm-l-12-v2",
});

用法

首先,建立一个基本的 RAG 摄取管道,包括嵌入、文本分割器和向量存储。我们将使用它来重新排序关于所选查询的一些文档。
import { readFileSync } from "node:fs";
import { MemoryVectorStore } from "@langchain/classic/vectorstores/memory";
import { WatsonxEmbeddings } from "@langchain/community/embeddings/ibm";
import { CharacterTextSplitter } from "@langchain/textsplitters";

const embeddings = new WatsonxEmbeddings({
 version: "YYYY-MM-DD",
 serviceUrl: process.env.API_URL,
 projectId: "<PROJECT_ID>",
 spaceId: "<SPACE_ID>",
 model: "ibm/slate-125m-english-rtrvr",
});

const textSplitter = new CharacterTextSplitter({
  chunkSize: 400,
  chunkOverlap: 0,
});

const query = "What did the president say about Ketanji Brown Jackson";
const text = readFileSync("state_of_the_union.txt", "utf8");

const docs = await textSplitter.createDocuments([text]);
const vectorStore = await MemoryVectorStore.fromDocuments(docs, embeddings);
const vectorStoreRetriever = vectorStore.asRetriever();

const result = await vectorStoreRetriever.invoke(query);
console.log(result);
[
  Document {
    pageContent: 'And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.',
    metadata: { loc: [Object] },
    id: undefined
  },
  Document {
    pageContent: 'I spoke with their families and told them that we are forever in debt for their sacrifice, and we will carry on their mission to restore the trust and safety every community deserves. \n' +
      '\n' +
      'I’ve worked on these issues a long time. \n' +
      '\n' +
      'I know what works: Investing in crime preventionand community police officers who’ll walk the beat, who’ll know the neighborhood, and who can restore trust and safety.',
    metadata: { loc: [Object] },
    id: undefined
  },
  Document {
    pageContent: 'We are the only nation on Earth that has always turned every crisis we have faced into an opportunity. \n' +
      '\n' +
      'The only nation that can be defined by a single word: possibilities. \n' +
      '\n' +
      'So on this night, in our 245th year as a nation, I have come to report on the State of the Union. \n' +
      '\n' +
      'And my report is this: the State of the Union is strong—because you, the American people, are strong.',
    metadata: { loc: [Object] },
    id: undefined
  },
  Document {
    pageContent: 'And I’m taking robust action to make sure the pain of our sanctions  is targeted at Russia’s economy. And I will use every tool at our disposal to protect American businesses and consumers. \n' +
      '\n' +
      'Tonight, I can announce that the United States has worked with 30 other countries to release 60 Million barrels of oil from reserves around the world.',
    metadata: { loc: [Object] },
    id: undefined
  }
]
传递选定的文档进行重新排序,并接收每个文档的具体分数
import { WatsonxRerank } from "@langchain/community/document_compressors/ibm";

const reranker = new WatsonxRerank({
  version: "2024-05-31",
  serviceUrl: process.env.WATSONX_AI_SERVICE_URL,
  projectId: process.env.WATSONX_AI_PROJECT_ID,
  model: "cross-encoder/ms-marco-minilm-l-12-v2",
});
const compressed = await reranker.rerank(result, query);
console.log(compressed);
[
  { index: 0, relevanceScore: 0.726995587348938 },
  { index: 1, relevanceScore: 0.5758284330368042 },
  { index: 2, relevanceScore: 0.5479092597961426 },
  { index: 3, relevanceScore: 0.5468723773956299 }
]
或者,您可以将文档与结果一起返回,为此请使用如下所示的 .compressDocuments() 方法。
const compressedWithResults = await reranker.compressDocuments(result, query);
console.log(compressedWithResults);
[
  Document {
    pageContent: 'And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.',
    metadata: { loc: [Object], relevanceScore: 0.726995587348938 },
    id: undefined
  },
  Document {
    pageContent: 'I spoke with their families and told them that we are forever in debt for their sacrifice, and we will carry on their mission to restore the trust and safety every community deserves. \n' +
      '\n' +
      'I’ve worked on these issues a long time. \n' +
      '\n' +
      'I know what works: Investing in crime preventionand community police officers who’ll walk the beat, who’ll know the neighborhood, and who can restore trust and safety.',
    metadata: { loc: [Object], relevanceScore: 0.5758284330368042 },
    id: undefined
  },
  Document {
    pageContent: 'We are the only nation on Earth that has always turned every crisis we have faced into an opportunity. \n' +
      '\n' +
      'The only nation that can be defined by a single word: possibilities. \n' +
      '\n' +
      'So on this night, in our 245th year as a nation, I have come to report on the State of the Union. \n' +
      '\n' +
      'And my report is this: the State of the Union is strong—because you, the American people, are strong.',
    metadata: { loc: [Object], relevanceScore: 0.5479092597961426 },
    id: undefined
  },
  Document {
    pageContent: 'And I’m taking robust action to make sure the pain of our sanctions  is targeted at Russia’s economy. And I will use every tool at our disposal to protect American businesses and consumers. \n' +
      '\n' +
      'Tonight, I can announce that the United States has worked with 30 other countries to release 60 Million barrels of oil from reserves around the world.',
    metadata: { loc: [Object], relevanceScore: 0.5468723773956299 },
    id: undefined
  }
]

API 参考

有关所有 Watsonx 文档压缩器功能和配置的详细文档,请参阅API 参考
以编程方式连接这些文档到 Claude、VSCode 等,通过 MCP 获取实时答案。
© . This site is unofficial and not affiliated with LangChain, Inc.