跳到主要内容
基于字符的分割是最简单的文本分割方法。它使用指定的字符序列(默认值:"\n\n")来分割文本,并以字符数来衡量分块长度。 关键点
  1. 文本如何分割:按给定字符分隔符。
  2. 分块大小如何衡量:按字符计数。
您可以选择
  • .splitText — 返回纯字符串块。
  • .createDocuments — 返回 LangChain Document 对象,当需要为下游任务保留元数据时很有用。
npm install @langchain/textsplitters
import { CharacterTextSplitter } from "@langchain/textsplitters";
import { readFileSync } from "fs";

// Example: read a long document
const stateOfTheUnion = readFileSync("state_of_the_union.txt", "utf8");

const splitter = new CharacterTextSplitter({
    separator: "\n\n",
    chunkSize: 1000,
    chunkOverlap: 200,
});
const texts = splitter.createDocuments([{ pageContent: stateOfTheUnion }]);
console.log(texts[0]);
Document {
    pageContent: 'Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans.  \n\nLast year COVID-19 kept us apart. This year we are finally together again. \n\nTonight, we meet as Democrats Republicans and Independents. But most importantly as Americans. \n\nWith a duty to one another to the American people to the Constitution. \n\nAnd with an unwavering resolve that freedom will always triumph over tyranny. \n\nSix days ago, Russia’s Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways. But he badly miscalculated. \n\nHe thought he could roll into Ukraine and the world would roll over. Instead he met a wall of strength he never imagined. \n\nHe met the Ukrainian people. \n\nFrom President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world.'
}
使用 .createDocuments 将与每个文档关联的元数据传播到输出块
const metadatas = [{"document": 1}, {"document": 2}]
const documents = splitter.createDocuments(
    [{ pageContent: stateOfTheUnion }, { pageContent: stateOfTheUnion }],
    { metadatas: metadatas }
);
console.log(documents[0]);
Document {
    pageContent: 'Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans.  \n\nLast year COVID-19 kept us apart. This year we are finally together again. \n\nTonight, we meet as Democrats Republicans and Independents. But most importantly as Americans. \n\nWith a duty to one another to the American people to the Constitution. \n\nAnd with an unwavering resolve that freedom will always triumph over tyranny. \n\nSix days ago, Russia’s Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways. But he badly miscalculated. \n\nHe thought he could roll into Ukraine and the world would roll over. Instead he met a wall of strength he never imagined. \n\nHe met the Ukrainian people. \n\nFrom President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world.',
    metadata: {'document': 1}
}
使用 .splitText 直接获取字符串内容
splitter.splitText(stateOfTheUnion)[0]
'Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans.  \n\nLast year COVID-19 kept us apart. This year we are finally together again. \n\nTonight, we meet as Democrats Republicans and Independents. But most importantly as Americans. \n\nWith a duty to one another to the American people to the Constitution. \n\nAnd with an unwavering resolve that freedom will always triumph over tyranny. \n\nSix days ago, Russia’s Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways. But he badly miscalculated. \n\nHe thought he could roll into Ukraine and the world would roll over. Instead he met a wall of strength he never imagined. \n\nHe met the Ukrainian people. \n\nFrom President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world.'

以编程方式连接这些文档到 Claude、VSCode 等,通过 MCP 获取实时答案。
© . This site is unofficial and not affiliated with LangChain, Inc.