Gel

LangChain 向量存储抽象的实现，使用 gel 作为后端。

Gel 是一个开源的 PostgreSQL 数据层，针对快速开发到生产周期进行了优化。它带有一个高级的严格类型图状数据模型、可组合的分层查询语言、完整的 SQL 支持、迁移、身份验证和 AI 模块。代码位于一个名为 langchain-gel 的集成包中。

设置

首先安装相关软件包

! pip install -qU gel langchain-gel

初始化

为了将 Gel 用作 VectorStore 的后端，您需要一个可用的 Gel 实例。幸运的是，这不必涉及 Docker 容器或任何复杂的事情，除非您想！要设置本地实例，请运行：

! gel project init --non-interactive

如果您正在使用 Gel Cloud（您应该使用！），请在该命令中添加一个参数

gel project init --server-instance <org-name>/<instance-name>

有关运行 Gel 的全面方法列表，请查看参考文档的运行 Gel 部分。

设置模式

Gel 模式是对应用程序数据模型的一个明确的高级描述。除了能够让您精确定义数据如何布局之外，它还驱动着 Gel 的许多强大功能，例如链接、访问策略、函数、触发器、约束、索引等等。 LangChain 的 VectorStore 期望模式的以下布局：

schema_content = """
using extension pgvector;

module default {
    scalar type EmbeddingVector extending ext::pgvector::vector<1536>;

    type Record {
        required collection: str;
        text: str;
        embedding: EmbeddingVector;
        external_id: str {
            constraint exclusive;
        };
        metadata: json;

        index ext::pgvector::hnsw_cosine(m := 16, ef_construction := 128)
            on (.embedding)
    }
}
""".strip()

with open("dbschema/default.gel", "w") as f:
    f.write(schema_content)

为了将模式更改应用于数据库，请使用 Gel 的迁移机制运行迁移

! gel migration create --non-interactive
! gel migrate

从现在开始，GelVectorStore 可以作为 LangChain 中任何其他向量存储的直接替代品。

实例化

# | output: false
# | echo: false
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

from langchain_gel import GelVectorStore

vector_store = GelVectorStore(
    embeddings=embeddings,
)

管理向量存储

向向量存储添加项目

请注意，通过 ID 添加文档将覆盖任何匹配该 ID 的现有文档。

from langchain_core.documents import Document

docs = [
    Document(
        page_content="there are cats in the pond",
        metadata={"id": "1", "location": "pond", "topic": "animals"},
    ),
    Document(
        page_content="ducks are also found in the pond",
        metadata={"id": "2", "location": "pond", "topic": "animals"},
    ),
    Document(
        page_content="fresh apples are available at the market",
        metadata={"id": "3", "location": "market", "topic": "food"},
    ),
    Document(
        page_content="the market also sells fresh oranges",
        metadata={"id": "4", "location": "market", "topic": "food"},
    ),
    Document(
        page_content="the new art exhibit is fascinating",
        metadata={"id": "5", "location": "museum", "topic": "art"},
    ),
    Document(
        page_content="a sculpture exhibit is also at the museum",
        metadata={"id": "6", "location": "museum", "topic": "art"},
    ),
    Document(
        page_content="a new coffee shop opened on Main Street",
        metadata={"id": "7", "location": "Main Street", "topic": "food"},
    ),
    Document(
        page_content="the book club meets at the library",
        metadata={"id": "8", "location": "library", "topic": "reading"},
    ),
    Document(
        page_content="the library hosts a weekly story time for kids",
        metadata={"id": "9", "location": "library", "topic": "reading"},
    ),
    Document(
        page_content="a cooking class for beginners is offered at the community center",
        metadata={"id": "10", "location": "community center", "topic": "classes"},
    ),
]

vector_store.add_documents(docs, ids=[doc.metadata["id"] for doc in docs])

从向量存储中删除项目

vector_store.delete(ids=["3"])

查询向量存储

一旦您的向量存储被创建并添加了相关文档，您很可能希望在链或代理运行期间查询它。

过滤支持

向量存储支持一组过滤器，这些过滤器可以应用于文档的元数据字段。

运算符	含义/类别
$eq	相等 (==)
$ne	不相等 (!=)
$lt	小于 (<)
$lte	小于或等于 (<=)
$gt	大于 (>)
$gte	大于或等于 (>=)
$in	特殊情况 (in)
$nin	特殊情况 (not in)
$between	特殊情况 (between)
$like	文本 (like)
$ilike	文本 (不区分大小写 like)
$and	逻辑 (and)
$or	逻辑 (or)

直接查询

执行简单的相似性搜索可以按如下方式完成

results = vector_store.similarity_search(
    "kitty", k=10, filter={"id": {"$in": ["1", "5", "2", "9"]}}
)
for doc in results:
    print(f"* {doc.page_content} [{doc.metadata}]")

如果您提供一个包含多个字段但没有运算符的字典，则顶层将被解释为逻辑 AND 过滤器

vector_store.similarity_search(
    "ducks",
    k=10,
    filter={
        "id": {"$in": ["1", "5", "2", "9"]},
        "location": {"$in": ["pond", "market"]},
    },
)

vector_store.similarity_search(
    "ducks",
    k=10,
    filter={
        "$and": [
            {"id": {"$in": ["1", "5", "2", "9"]}},
            {"location": {"$in": ["pond", "market"]}},
        ]
    },
)

如果您想执行相似性搜索并接收相应的分数，可以运行

results = vector_store.similarity_search_with_score(query="cats", k=1)
for doc, score in results:
    print(f"* [SIM={score:3f}] {doc.page_content} [{doc.metadata}]")

通过转换为检索器进行查询

您还可以将向量存储转换为检索器，以便在您的链中更轻松地使用。

retriever = vector_store.as_retriever(search_kwargs={"k": 1})
retriever.invoke("kitty")

用于检索增强生成的使用

有关如何将此向量存储用于检索增强生成 (RAG) 的指南，请参阅以下部分

API 参考

有关所有 GelVectorStore 功能和配置的详细文档，请参阅 API 参考：python.langchain.com/api_reference/

在 GitHub 上编辑此页面源文件。

以编程方式连接这些文档到 Claude、VSCode 等，通过 MCP 获取实时答案。

热门提供商

按组件划分的集成

设置

初始化

设置模式

实例化

管理向量存储

向向量存储添加项目

从向量存储中删除项目

查询向量存储

过滤支持

直接查询

通过转换为检索器进行查询

用于检索增强生成的使用

API 参考

热门提供商

按组件划分的集成

​设置

​初始化

​设置模式

​实例化

​管理向量存储

​向向量存储添加项目

​从向量存储中删除项目

​查询向量存储

​过滤支持

​直接查询

​通过转换为检索器进行查询

​用于检索增强生成的使用

​API 参考

设置

初始化

设置模式

实例化

管理向量存储

向向量存储添加项目

从向量存储中删除项目

查询向量存储

过滤支持

直接查询

通过转换为检索器进行查询

用于检索增强生成的使用

API 参考