Azure Cosmos DB for NoSQL 支持查询具有灵活架构的项,并原生支持 JSON。它现在提供矢量索引和搜索功能。此功能旨在处理高维矢量,从而在任何规模下都能实现高效准确的矢量搜索。您现在可以将矢量直接与数据一起存储在文档中。数据库中的每个文档不仅可以包含传统的无架构数据,还可以包含高维矢量作为文档的其他属性。
了解如何利用 Azure Cosmos DB for NoSQL 的矢量搜索功能,请访问此页面。如果您没有 Azure 帐户,可以创建一个免费帐户以开始使用。
您首先需要安装 @langchain/azure-cosmosdb 包。
有关安装 LangChain 软件包的一般说明,请参阅此部分。
npm install @langchain/azure-cosmosdb @langchain/core
您还需要运行一个 Azure Cosmos DB for NoSQL 实例。您可以按照此指南在 Azure 门户上免费部署一个版本。 一旦您的实例运行起来,请确保您拥有连接字符串。您可以在 Azure 门户的实例“设置/密钥”部分找到它们。然后,您需要设置以下环境变量:# Use connection string to authenticate
AZURE_COSMOSDB_NOSQL_CONNECTION_STRING=
# Use managed identity to authenticate
AZURE_COSMOSDB_NOSQL_ENDPOINT=
使用 Azure 托管标识
如果您正在使用 Azure 托管标识,您可以这样配置凭据
import { AzureCosmosDBNoSQLVectorStore } from "@langchain/azure-cosmosdb";
import { OpenAIEmbeddings } from "@langchain/openai";
// Create Azure Cosmos DB vector store
const store = new AzureCosmosDBNoSQLVectorStore(new OpenAIEmbeddings(), {
// Or use environment variable AZURE_COSMOSDB_NOSQL_ENDPOINT
endpoint: "https://my-cosmosdb.documents.azure.com:443/",
// Database and container must already exist
databaseName: "my-database",
containerName: "my-container",
});
使用 Azure 托管标识和基于角色的访问控制时,必须确保数据库和容器已预先创建。RBAC 不提供创建数据库和容器的权限。您可以从 Azure Cosmos DB 文档中获取有关权限模型的更多信息。
使用过滤器时的安全注意事项
如果数据未正确清理,将过滤器与用户提供的输入一起使用可能会带来安全风险。请遵循以下建议以防止潜在的安全问题。
允许原始用户输入被串联到 SQL 样式的子句中,例如 WHERE ${userFilter},会引入 SQL 注入攻击的关键风险,可能暴露意外数据或损害系统完整性。为了缓解此问题,请始终使用 Azure Cosmos DB 的参数化查询机制,传入 @param 占位符,这可以将查询逻辑与用户提供的输入清晰地分开。 这是一个不安全代码的示例:import { AzureCosmosDBNoSQLVectorStore } from "@langchain/azure-cosmosdb";
const store = new AzureCosmosDBNoSQLVectorStore(embeddings, {});
// Unsafe: user-controlled input injected into the query
const userId = req.query.userId; // e.g. "123' OR 1=1"
const unsafeQuerySpec = {
query: `SELECT * FROM c WHERE c.metadata.userId = '${userId}'`,
};
await store.delete({ filter: unsafeQuerySpec });
如果攻击者提供 123 OR 1=1,则查询变为 SELECT * FROM c WHERE c.metadata.userId = '123' OR 1=1,这将强制条件始终为真,导致它绕过预期的过滤器并删除所有文档。 为了防止这种注入风险,您定义一个占位符,如 @userId,Cosmos DB 将用户输入作为参数单独绑定,确保它被严格视为数据而不是可执行查询逻辑,如下所示。import { SqlQuerySpec } from "@azure/cosmos";
const safeQuerySpec: SqlQuerySpec = {
query: "SELECT * FROM c WHERE c.metadata.userId = @userId",
parameters: [{ name: "@userId", value: userId }],
};
await store.delete({ filter: safeQuerySpec });
现在,如果攻击者输入 123 OR 1=1,该输入将被视为要匹配的文字字符串值,而不是查询结构的一部分。 请参阅关于 Azure Cosmos DB for NoSQL 中的参数化查询的官方文档,了解更多使用示例和最佳实践。使用示例
下面是一个示例,它将文件中的文档索引到 Azure Cosmos DB for NoSQL 中,运行矢量搜索查询,最后使用链条根据检索到的文档用自然语言回答问题。
import { AzureCosmosDBNoSQLVectorStore } from "@langchain/azure-cosmosdb";
import { ChatPromptTemplate } from "@langchain/core/prompts";
import { ChatOpenAI, OpenAIEmbeddings } from "@langchain/openai";
import { createStuffDocumentsChain } from "@langchain/classic/chains/combine_documents";
import { createRetrievalChain } from "@langchain/classic/chains/retrieval";
import { TextLoader } from "@langchain/classic/document_loaders/fs/text";
import { RecursiveCharacterTextSplitter } from "@langchain/textsplitters";
// Load documents from file
const loader = new TextLoader("./state_of_the_union.txt");
const rawDocuments = await loader.load();
const splitter = new RecursiveCharacterTextSplitter({
chunkSize: 1000,
chunkOverlap: 0,
});
const documents = await splitter.splitDocuments(rawDocuments);
// Create Azure Cosmos DB vector store
const store = await AzureCosmosDBNoSQLVectorStore.fromDocuments(
documents,
new OpenAIEmbeddings(),
{
databaseName: "langchain",
containerName: "documents",
}
);
// Performs a similarity search
const resultDocuments = await store.similaritySearch(
"What did the president say about Ketanji Brown Jackson?"
);
console.log("Similarity search results:");
console.log(resultDocuments[0].pageContent);
/*
Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections.
Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service.
One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.
And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.
*/
// Use the store as part of a chain
const model = new ChatOpenAI({ model: "gpt-3.5-turbo-1106" });
const questionAnsweringPrompt = ChatPromptTemplate.fromMessages([
[
"system",
"Answer the user's questions based on the below context:\n\n{context}",
],
["human", "{input}"],
]);
const combineDocsChain = await createStuffDocumentsChain({
llm: model,
prompt: questionAnsweringPrompt,
});
const chain = await createRetrievalChain({
retriever: store.asRetriever(),
combineDocsChain,
});
const res = await chain.invoke({
input: "What is the president's top priority regarding prices?",
});
console.log("Chain response:");
console.log(res.answer);
/*
The president's top priority is getting prices under control.
*/
// Clean up
await store.delete();