Cassandra

兼容性仅在 Node.js 上可用。

Apache Cassandra® 是一个 NoSQL、面向行、高度可扩展和高可用性的数据库。最新版本的 Apache Cassandra 原生支持向量相似度搜索。

设置

首先，安装 Cassandra Node.js 驱动程序

有关安装 LangChain 软件包的一般说明，请参阅此部分。

npm

npm install cassandra-driver @langchain/community @langchain/openai @langchain/core

根据您的数据库提供商，连接数据库的具体方式会有所不同。我们将创建一个名为 configConnection 的文档，它将作为向量存储配置的一部分使用。

Apache Cassandra®

Apache Cassandra® 5.0 及以上版本支持向量搜索。您可以使用标准连接文档，例如

const configConnection = {
  contactPoints: ['h1', 'h2'],
  localDataCenter: 'datacenter1',
  credentials: {
    username: <...> as string,
    password: <...> as string,
  },
};

Astra DB

Astra DB 是一个云原生 Cassandra 即服务平台。

创建一个 Astra DB 账户。
创建一个启用向量功能的数据库。
为您的数据库创建一个令牌。

const configConnection = {
  serviceProviderArgs: {
    astra: {
      token: <...> as string,
      endpoint: <...> as string,
    },
  },
};

您可以提供属性 datacenterID: 而不是 endpoint:，并且可选地提供 regionName:。

索引文档

import { CassandraStore } from "@langchain/classic/vectorstores/cassandra";
import { OpenAIEmbeddings } from "@langchain/openai";

// The configConnection document is defined above
const config = {
  ...configConnection,
  keyspace: "test",
  dimensions: 1536,
  table: "test",
  indices: [{ name: "name", value: "(name)" }],
  primaryKey: {
    name: "id",
    type: "int",
  },
  metadataColumns: [
    {
      name: "name",
      type: "text",
    },
  ],
};

const vectorStore = await CassandraStore.fromTexts(
  ["I am blue", "Green yellow purple", "Hello there hello"],
  [
    { id: 2, name: "2" },
    { id: 1, name: "1" },
    { id: 3, name: "3" },
  ],
  new OpenAIEmbeddings(),
  cassandraConfig
);

查询文档

const results = await vectorStore.similaritySearch("Green yellow purple", 1);

或过滤查询

const results = await vectorStore.similaritySearch("B", 1, { name: "Bubba" });

向量类型

Cassandra 支持 `cosine`（默认）、`dot_product` 和 `euclidean` 相似性搜索；这在向量存储首次创建时定义，并在构造函数参数 `vectorType` 中指定，例如

  ...,
  vectorType: "dot_product",
  ...

索引

在版本 5 中，Cassandra 引入了存储附加索引（SAI）。这些索引允许在不指定分区键的情况下进行 `WHERE` 过滤，并允许使用额外的运算符类型，例如非等式。您可以使用 `indices` 参数定义这些索引，该参数接受零个或多个字典，每个字典包含 `name` 和 `value` 条目。索引是可选的，但如果对非分区列使用过滤查询，则需要索引。

`name` 条目是对象名称的一部分；在名为 `test_table` 的表上，一个 `name: "some_column"` 的索引将是 `idx_test_table_some_column`。
`value` 条目是创建索引的列，用 `(` 和 `)` 括起来。对于上述列 `some_column`，它将被指定为 `value: "(some_column)"`。
可选的 `options` 条目是一个映射，传递给 `CREATE CUSTOM INDEX` 语句的 `WITH OPTIONS =` 子句。此映射中的具体条目是索引类型特定的。

  indices: [{ name: "some_column", value: "(some_column)" }],

高级筛选

默认情况下，过滤器以等式 ` = ` 应用。对于具有 `indices` 条目的字段，您可以提供一个 `operator`，其字符串值为索引支持的值；在这种情况下，您可以指定一个或多个过滤器，作为单例或列表（将进行 `AND` 操作）。例如

   { name: "create_datetime", operator: ">", value: some_datetime_variable }

或

[
  { userid: userid_variable },
  { name: "create_datetime", operator: ">", value: some_date_variable },
];

`value` 可以是单个值或数组。如果它不是数组，或者 `value` 中只有一个元素，则生成的查询将类似于 `${name} ${operator} ?`，其中 `value` 绑定到 `?`。如果 `value` 数组中有一个以上的元素，则会计算 `name` 中未加引号的 `?` 的数量，并从 `value` 的长度中减去该数量，然后将该数量的 `?` 放在运算符的右侧；如果 `?` 的数量多于一个，则它们将被封装在 `(` 和 `)` 中，例如 `(?, ?, ?)`。这有助于在运算符左侧绑定值，这对于某些函数很有用；例如地理距离过滤器：

{
  name: "GEO_DISTANCE(coord, ?)",
  operator: "<",
  value: [new Float32Array([53.3730617,-6.3000515]), 10000],
},

数据分区和复合键

在某些系统中，您可能希望出于各种原因对数据进行分区，例如按用户或按会话分区。Cassandra 中的数据始终是分区的；默认情况下，此库将按第一个主键字段进行分区。您可以指定构成记录主（唯一）键的多个列，并可选择指示哪些字段应作为分区键的一部分。例如，向量存储可以按 `userid` 和 `collectionid` 进行分区，而附加字段 `docid` 和 `docpart` 使单个条目独一无二

  ...,
  primaryKey: [
    {name: "userid", type: "text", partition: true},
    {name: "collectionid", type: "text", partition: true},
    {name: "docid", type: "text"},
    {name: "docpart", type: "int"},
  ],
  ...

搜索时，您可以在过滤器中包含分区键，而无需为这些列定义 `indices`；您不需要指定所有分区键，但必须首先指定键中的那些。在上面的示例中，您可以指定过滤器 `{userid: userid_variable}` 和 `{userid: userid_variable, collectionid: collectionid_variable}`，但如果您只想指定过滤器 `{collectionid: collectionid_variable}`，则必须在 `indices` 列表中包含 `collectionid`。

其他配置选项

在配置文档中，提供了进一步的可选参数；它们的默认值是

  ...,
  maxConcurrency: 25,
  batchSize: 1,
  withClause: "",
  ...

参数	用法
`maxConcurrency`	在给定时间内将发送到 Cassandra 的并发请求数。
`batchSize`	每次向 Cassandra 发送的文档数量。当使用大于 1 的值时，您应该确保您的批处理大小不会超过 Cassandra 参数 `batch_size_fail_threshold_in_kb`。批处理是未记录的。
`withClause`	Cassandra 表可以使用可选的 `WITH` 子句创建；这通常不需要，但为了完整性而提供。

向量存储概念指南
向量存储操作指南

在 GitHub 上编辑此页面源文件。

以编程方式连接这些文档到 Claude、VSCode 等，通过 MCP 获取实时答案。

热门提供商

通用集成

RAG 集成

设置

Apache Cassandra®

Astra DB

索引文档

查询文档

向量类型

索引

高级筛选

数据分区和复合键

其他配置选项

热门提供商

通用集成

RAG 集成

​设置

​Apache Cassandra®

​Astra DB

​索引文档

​查询文档

​向量类型

​索引

​高级筛选

​数据分区和复合键

​其他配置选项

​相关

设置

Apache Cassandra®

Astra DB

索引文档

查询文档

向量类型

索引

高级筛选

数据分区和复合键

其他配置选项

相关