跳到主要内容
Azure AI Search(以前称为 Azure Search 和 Azure Cognitive Search)是一个分布式 RESTful 搜索引擎,针对 Azure 上生产规模工作负载的速度和相关性进行了优化。它还支持使用k-近邻 (kNN) 算法进行向量搜索,以及语义搜索 此向量存储集成支持全文搜索、向量搜索和混合搜索以获得最佳排名性能 通过此页面了解如何利用 Azure AI Search 的向量搜索功能。如果您没有 Azure 帐户,可以创建一个免费帐户以开始使用。

设置

首先需要安装 @azure/search-documents SDK 和 @langchain/community
有关安装 LangChain 软件包的一般说明,请参阅此部分
npm
npm install -S @langchain/community @langchain/core @azure/search-documents
您还需要运行一个 Azure AI Search 实例。您可以按照本指南在 Azure 门户上部署一个免费版本,无需任何费用。 一旦您的实例运行起来,请确保您拥有终结点和管理员密钥(查询密钥只能用于搜索文档,不能用于索引、更新或删除)。终结点是您的实例的 URL,您可以在 Azure 门户中,在您的实例的“概述”部分找到。管理员密钥可以在您的实例的“密钥”部分找到。然后您需要设置以下环境变量:
.env 变量
# Azure AI Search connection settings
AZURE_AISEARCH_ENDPOINT=
AZURE_AISEARCH_KEY=

# If you're using Azure OpenAI API, you'll need to set these variables
AZURE_OPENAI_API_KEY=
AZURE_OPENAI_API_INSTANCE_NAME=
AZURE_OPENAI_API_DEPLOYMENT_NAME=
AZURE_OPENAI_API_EMBEDDINGS_DEPLOYMENT_NAME=
AZURE_OPENAI_API_VERSION=

# Or you can use the OpenAI API directly
OPENAI_API_KEY=
混合搜索是一项结合了全文搜索和向量搜索优势的功能,可提供最佳排名性能。它在 Azure AI Search 向量存储中默认启用,但您可以通过在创建向量存储时设置 search.type 属性来选择不同的搜索查询类型。 您可以在官方文档中阅读更多关于混合搜索以及它如何改进您的搜索结果的信息。 在某些场景中,例如检索增强生成 (RAG),您可能希望除了混合搜索之外还启用语义排名,以提高搜索结果的相关性。您可以通过在创建向量存储时将 search.type 属性设置为 AzureAISearchQueryType.SemanticHybrid 来启用语义排名。请注意,语义排名功能仅在基本及更高级别的定价层中可用,并受区域可用性限制。 您可以在这篇博客文章中阅读更多关于使用语义排名与混合搜索的性能信息。

示例:索引文档、向量搜索和LLM集成

下面是一个示例,它将文件中的文档索引到 Azure AI Search 中,运行混合搜索查询,最后使用一个链条根据检索到的文档以自然语言回答问题。
import {
  AzureAISearchVectorStore,
  AzureAISearchQueryType,
} from "@langchain/community/vectorstores/azure_aisearch";
import { ChatPromptTemplate } from "@langchain/core/prompts";
import { ChatOpenAI, OpenAIEmbeddings } from "@langchain/openai";
import { createStuffDocumentsChain } from "@langchain/classic/chains/combine_documents";
import { createRetrievalChain } from "@langchain/classic/chains/retrieval";
import { TextLoader } from "@langchain/classic/document_loaders/fs/text";
import { RecursiveCharacterTextSplitter } from "@langchain/textsplitters";

// Load documents from file
const loader = new TextLoader("./state_of_the_union.txt");
const rawDocuments = await loader.load();
const splitter = new RecursiveCharacterTextSplitter({
  chunkSize: 1000,
  chunkOverlap: 0,
});
const documents = await splitter.splitDocuments(rawDocuments);

// Create Azure AI Search vector store
const store = await AzureAISearchVectorStore.fromDocuments(
  documents,
  new OpenAIEmbeddings(),
  {
    search: {
      type: AzureAISearchQueryType.SimilarityHybrid,
    },
  }
);

// The first time you run this, the index will be created.
// You may need to wait a bit for the index to be created before you can perform
// a search, or you can create the index manually beforehand.

// Performs a similarity search
const resultDocuments = await store.similaritySearch(
  "What did the president say about Ketanji Brown Jackson?"
);

console.log("Similarity search results:");
console.log(resultDocuments[0].pageContent);
/*
  Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections.

  Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service.

  One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.

  And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.
*/

// Use the store as part of a chain
const model = new ChatOpenAI({ model: "gpt-3.5-turbo-1106" });
const questionAnsweringPrompt = ChatPromptTemplate.fromMessages([
  [
    "system",
    "Answer the user's questions based on the below context:\n\n{context}",
  ],
  ["human", "{input}"],
]);

const combineDocsChain = await createStuffDocumentsChain({
  llm: model,
  prompt: questionAnsweringPrompt,
});

const chain = await createRetrievalChain({
  retriever: store.asRetriever(),
  combineDocsChain,
});

const response = await chain.invoke({
  input: "What is the president's top priority regarding prices?",
});

console.log("Chain response:");
console.log(response.answer);
/*
  The president's top priority is getting prices under control.
*/

以编程方式连接这些文档到 Claude、VSCode 等,通过 MCP 获取实时答案。
© . This site is unofficial and not affiliated with LangChain, Inc.