跳到主要内容
Polaris AI DataInsight 是一个文档解析器,可以从各种文件格式中提取文档元素(文本、图像、复杂表格、图表等)并转换为结构化的 JSON,使其易于集成到 RAG 系统中。

安装

安装 langchain-polaris-ai-datainsight 包。
pip install langchain-polaris-ai-datainsight

环境设置

请确保设置以下环境变量

用法

import getpass
import os

os.environ["POLARIS_AI_DATA_INSIGHT_API_KEY"] = getpass.getpass(
    "Enter your PolarisAIDataInsight API key: "
)
from langchain_polaris_ai_datainsight import PolarisAIDataInsightLoader

loader = PolarisAIDataInsightLoader(
    file_path="example_data/polaris_ai_example.docx",
    resources_dir="example_data/tmp",
    mode="page",  # "element", "page", or "single". (default is "single")
)

docs = loader.load()  # or loader.lazy_load()

for doc in docs[:3]:
    print(" --------- < Page Content > --------- ")
    print(doc.page_content)
    print(" --------- < Metadata > --------- ")
    print(doc.metadata)
    print("\n")
然后,您将看到从文档中提取的内容和元数据,如下所示
--------- < Page Content > ---------
2025 Seed Program Application

I. Funding Information by Track

1. Beginning and Advanced Track Comparison Overview

<table><tbody><tr><td>Category</td><td>Beginning Track*</td><td>Advanced Track*</td></tr><tr><td>Funding target</td><td>A university located outside Korea that has a Central Grant Management Department, an existing Korean Studies infrastructure, and plans to establish an education foundation.</td><td>A non-Korean university with a Central Grant Management Department, at least one full-time Korean Studies faculty member, an undergraduate Korean Studies major or department, and commitment to supporting Korean Studies.</td></tr><tr><td>Funding period</td><td>3 years</td><td>5 years<3+2years></td></tr><tr><td>Funding size</td><td>Maximum possible funding depends on the applicant university’s country<br><table><tbody><tr><td>Country Group*</td><td>Maximum Funding**</td></tr><tr><td>A</td><td>Up to KRW 200 million</td></tr><tr><td>B</td><td>Up to KRW 50 million</td></tr></tbody></table></td><td>Maximum possible funding depends on the applicant university’s country<br><table><tbody><tr><td>Country Group*</td><td>Maximum Funding**</td></tr><tr><td>A</td><td>Up to KRW 150 million</td></tr><tr><td>B</td><td>Up to KRW 90 million</td></tr></tbody></table></td></tr><tr><td>Required project content</td><td>·	Fund 2 or more scholarship students<br>·	Offer 1 or more regular Korean Studies lecture courses (Excluding Korean language courses)<br>·	Hold 1 or more workshops per year in which that students may participate</td><td>·	Hire 1 or more Korean Studies full-time faculty<br>·	Fund 1 or more scholarship student for Korean Studies<br>·	Offer 2 or more regular graduate-level Korean Studies lecture courses (Excluding Korean language courses)<br>·	Hold 1 or more international Korean Studies conference<br>·	Establish and manage a website, blog, or social media relating to the program </td></tr><tr><td>Recommended content</td><td>·	Foster talent (education)<br>·	Establish a Korean Studies research institute/center<br>·	Establish Korean Studies undergraduate department/major & program<br>·	Develop Korean Studies textbooks<br>·	Hold academic activities</td><td>·	Foster talent (education)<br>·	Establish a Korean Studies research institute/center<br>·	Establish Korean Studies M.A/Ph.D. department/major & program<br>·	Develop Korean Studies textbooks<br>·	Hold academic activities</td></tr></tbody></table>

<img id="di.image.im12" data-category="image"/>

 2 / 3


 --------- < Metadata > ---------
{'di.text.he2te0': {'id': 'di.text.he2te0', 'type': 'text'}, 'di.text.te0': {'id': 'di.text.te0', 'type': 'text'}, 'di.text.te2': {'id': 'di.text.te2', 'type': 'text'}, 'di.table.ta9': {'id': 'di.table.ta9', 'type': 'table'}, 'di.image.im12': {'id': 'di.image.im12', 'type': 'image', 'src': '/home/jenkins_agent/Project/langchain/docs/docs/integrations/document_loaders/example_data/tmp/tmpaynkptxx/polaris_ai_example.docx_image12.png'}, 'di.text.fo3te0': {'id': 'di.text.fo3te0', 'type': 'text'}}


 --------- < Page Content > ---------
2025 Seed Program Application

II. Review and Selection

1. Review Process

<img id="di.image.im13" data-category="image"/>





Review of whether the basic requirements for application have been met







Review of the Project Proposal

Admistered by the Expert Review Team







Final review and decision

Admistered by the Comprehensive Review Committee



1. Preliminary Review



2. Content Review (80 pts)



3. Comprehensive Review (20 pts)

2. Review Stages and Content

Stage 1: Preliminary Review

Conducted by Main Department

●	Verifies document submission, eligibility, and overlapping support.

●	Applications missing required documents, signatures, or failing to meet eligibility do not proceed.

●	Applications with Indirect Expenses over 10% of Direct Expenses (including Labor Expenses) are rejected.

Stage 2: Content Review

Conducted by Expert Review Team

●	Online review: Points given individually

●	Panel review: Points determined by consensus

●	Assesses leadership potential, capacity, and project plans.

●	Items and scores assigned for evaluation.

<table><tbody><tr><td>Areas</td><td>Items (Points)</td><td>Content</td></tr></tbody></table>

 2 / 3


 --------- < Metadata > ---------
{'di.text.he2te0': {'id': 'di.text.he2te0', 'type': 'text'}, 'di.text.te10': {'id': 'di.text.te10', 'type': 'text'}, 'di.text.te12': {'id': 'di.text.te12', 'type': 'text'}, 'di.image.im13': {'id': 'di.image.im13', 'type': 'image', 'src': '/home/jenkins_agent/Project/langchain/docs/docs/integrations/document_loaders/example_data/tmp/tmpaynkptxx/polaris_ai_example.docx_image13.png'}, 'di.text.sh15': {'id': 'di.text.sh15', 'type': 'text'}, 'di.text.sh16': {'id': 'di.text.sh16', 'type': 'text'}, 'di.text.sh16te0': {'id': 'di.text.sh16te0', 'type': 'text'}, 'di.text.sh17': {'id': 'di.text.sh17', 'type': 'text'}, 'di.text.sh18': {'id': 'di.text.sh18', 'type': 'text'}, 'di.text.sh19': {'id': 'di.text.sh19', 'type': 'text'}, 'di.text.sh19te0': {'id': 'di.text.sh19te0', 'type': 'text'}, 'di.text.sh19te1': {'id': 'di.text.sh19te1', 'type': 'text'}, 'di.text.sh20': {'id': 'di.text.sh20', 'type': 'text'}, 'di.text.sh21': {'id': 'di.text.sh21', 'type': 'text'}, 'di.text.sh22': {'id': 'di.text.sh22', 'type': 'text'}, 'di.text.sh22te0': {'id': 'di.text.sh22te0', 'type': 'text'}, 'di.text.sh22te1': {'id': 'di.text.sh22te1', 'type': 'text'}, 'di.text.sh23': {'id': 'di.text.sh23', 'type': 'text'}, 'di.text.sh23te0': {'id': 'di.text.sh23te0', 'type': 'text'}, 'di.text.sh24': {'id': 'di.text.sh24', 'type': 'text'}, 'di.text.sh24te0': {'id': 'di.text.sh24te0', 'type': 'text'}, 'di.text.sh25': {'id': 'di.text.sh25', 'type': 'text'}, 'di.text.sh25te0': {'id': 'di.text.sh25te0', 'type': 'text'}, 'di.text.te15': {'id': 'di.text.te15', 'type': 'text'}, 'di.text.te16': {'id': 'di.text.te16', 'type': 'text'}, 'di.text.te17': {'id': 'di.text.te17', 'type': 'text'}, 'di.text.te18': {'id': 'di.text.te18', 'type': 'text'}, 'di.text.te19': {'id': 'di.text.te19', 'type': 'text'}, 'di.text.te20': {'id': 'di.text.te20', 'type': 'text'}, 'di.text.te21': {'id': 'di.text.te21', 'type': 'text'}, 'di.text.te22': {'id': 'di.text.te22', 'type': 'text'}, 'di.text.te23': {'id': 'di.text.te23', 'type': 'text'}, 'di.text.te24': {'id': 'di.text.te24', 'type': 'text'}, 'di.text.te25': {'id': 'di.text.te25', 'type': 'text'}, 'di.text.te26': {'id': 'di.text.te26', 'type': 'text'}, 'di.table.ta26': {'id': 'di.table.ta26', 'type': 'table'}, 'di.text.fo3te0': {'id': 'di.text.fo3te0', 'type': 'text'}}


 --------- < Page Content > ---------
2025 Seed Program Application

<table><tbody><tr><td rowspan="3">Evaluation of the Basis for the Project (40)</td><td>Potential to lead Korean Studies (20)</td><td>- Assess whether the university has a distinguished reputation in terms of history and academic disciplines.<br>- Evaluate the strength of the network between the Project Director and local researchers.</td></tr><tr><td>Performance capacity (20)<br>Eligibility criteria (10)</td><td>- Determine if the project director possesses the skills and commitment to execute the project (e.g., Korean language proficiency, influence within the institution, management skills).<br>- Review the achievements of collaborative researchers in Korean Studies.<br>- Confirm whether personnel (Beginning/Advanced) or coursework (Advanced) meet eligibility criteria.</td></tr><tr><td>University support (10)</td><td>- Measure the institution's willingness to support Korean Studies (financial, spatial, and human resources, appropriate indirect expense ratio).<br>- Assess the competency of the Central Grant Management Department.</td></tr><tr><td rowspan="2">Evaluation of the Project Content (40)</td><td>Project plans (30)</td><td>- Ensure that the project objectives are realistic and well-defined.<br>- Verify that the plan aligns with local conditions.<br>- Review the suitability of the Project Team’s structure.<br>- Assess whether the budget plan reflects local price levels.</td></tr></tbody></table>

 2 / 3


 --------- < Metadata > ---------
{'di.text.he2te0': {'id': 'di.text.he2te0', 'type': 'text'}, 'di.table.ta29': {'id': 'di.table.ta29', 'type': 'table'}, 'di.text.fo3te0': {'id': 'di.text.fo3te0', 'type': 'text'}}

以编程方式连接这些文档到 Claude、VSCode 等,通过 MCP 获取实时答案。
© . This site is unofficial and not affiliated with LangChain, Inc.