跳到主要内容
使用量化嵌入器嵌入所有文档。 这些嵌入器基于优化的模型,通过使用 optimum-intelIPEX 创建。 示例文本基于 SBERT
from langchain_community.embeddings import QuantizedBiEncoderEmbeddings

model_name = "Intel/bge-small-en-v1.5-rag-int8-static"
encode_kwargs = {"normalize_embeddings": True}  # set True to compute cosine similarity

model = QuantizedBiEncoderEmbeddings(
    model_name=model_name,
    encode_kwargs=encode_kwargs,
    query_instruction="Represent this sentence for searching relevant passages: ",
)
loading configuration file inc_config.json from cache at
INCConfig {
  "distillation": {},
  "neural_compressor_version": "2.4.1",
  "optimum_version": "1.16.2",
  "pruning": {},
  "quantization": {
    "dataset_num_samples": 50,
    "is_static": true
  },
  "save_onnx_model": false,
  "torch_version": "2.2.0",
  "transformers_version": "4.37.2"
}

Using `INCModel` to load a TorchScript model will be deprecated in v1.15.0, to load your model please use `IPEXModel` instead.
让我们提出一个问题,并与两个文档进行比较。第一个文档包含问题的答案,而第二个文档不包含。 我们可以检查哪个更符合我们的查询。
question = "How many people live in Berlin?"
documents = [
    "Berlin had a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.",
    "Berlin is well known for its museums.",
]
doc_vecs = model.embed_documents(documents)
Batches: 100%|██████████| 1/1 [00:00<00:00,  4.18it/s]
query_vec = model.embed_query(question)
import torch
doc_vecs_torch = torch.tensor(doc_vecs)
query_vec_torch = torch.tensor(query_vec)
query_vec_torch @ doc_vecs_torch.T
tensor([0.7980, 0.6529])
我们可以看到第一个文档确实排名更高。
以编程方式连接这些文档到 Claude、VSCode 等,通过 MCP 获取实时答案。
© . This site is unofficial and not affiliated with LangChain, Inc.