Xorbits Inference (Xinference)

本页演示如何将 Xinference 与 LangChain 结合使用。 Xinference 是一个功能强大且用途广泛的库，旨在为 LLM、语音识别模型和多模态模型提供服务，即使在您的笔记本电脑上也能运行。使用 Xorbits Inference，您只需一条命令即可轻松部署和提供您自己的或最先进的内置模型。

安装和设置

Xinference 可以通过 pip 从 PyPI 安装

pip install "xinference[all]"

LLM

Xinference 支持各种兼容 GGML 的模型，包括 chatglm、baichuan、whisper、vicuna 和 orca。要查看内置模型，请运行命令

xinference list --all

Xinference 的封装

您可以通过运行以下命令启动 Xinference 的本地实例

xinference

您还可以将 Xinference 部署在分布式集群中。为此，首先在您要运行 Xinference 的服务器上启动一个 Xinference supervisor

xinference-supervisor -H "${supervisor_host}"

然后，在您要运行 Xinference worker 的其他每台服务器上启动它们

xinference-worker -e "http://${supervisor_host}:9997"

您还可以通过运行以下命令启动 Xinference 的本地实例

xinference

一旦 Xinference 运行起来，就可以通过 CLI 或 Xinference 客户端访问一个用于模型管理的端点。对于本地部署，端点将是 https://:9997。对于集群部署，端点将是 http://${supervisor_host}:9997。然后，您需要启动一个模型。您可以指定模型名称和其他属性，包括 model_size_in_billions 和 quantization。您可以使用命令行界面 (CLI) 来完成此操作。例如，

xinference launch -n orca -s 3 -q q4_0

将返回一个模型 uid。示例用法：

from langchain_community.llms import Xinference

llm = Xinference(
    server_url="http://0.0.0.0:9997",
    model_uid = {model_uid} # replace model_uid with the model UID return from launching the model
)

llm(
    prompt="Q: where can we visit in the capital of France? A:",
    generate_config={"max_tokens": 1024, "stream": True},
)

用法

有关更多信息和详细示例，请参阅 xinference LLM 的示例

嵌入

Xinference 还支持嵌入查询和文档。有关更详细的演示，请参阅 xinference 嵌入的示例。

Xinference LangChain 合作伙伴包安装

通过以下方式安装集成包

pip install langchain-xinference

聊天模型

from langchain_xinference.chat_models import ChatXinference

LLM

from langchain_xinference.llms import Xinference

在 GitHub 上编辑此页面源文件。

以编程方式连接这些文档到 Claude、VSCode 等，通过 MCP 获取实时答案。

热门提供商

按组件划分的集成

安装和设置

LLM

Xinference 的封装

用法

嵌入

Xinference LangChain 合作伙伴包安装

聊天模型

LLM

热门提供商

按组件划分的集成

​安装和设置

​LLM

​Xinference 的封装

​用法

​嵌入

​Xinference LangChain 合作伙伴包安装

​聊天模型

​LLM

安装和设置

LLM

Xinference 的封装

用法

嵌入

Xinference LangChain 合作伙伴包安装

聊天模型

LLM