Polaris AI DataInsight 是一个文档解析器,可以从各种文件格式中提取文档元素(文本、图像、复杂表格、图表等)并转换为结构化的 JSON,使其易于集成到 RAG 系统中。
安装
安装langchain-polaris-ai-datainsight 包。
复制
向 AI 提问
pip install langchain-polaris-ai-datainsight
环境设置
请确保设置以下环境变量POLARIS_AI_DATA_INSIGHT_API_KEY:您的 Polaris AI DataInsight API 密钥。阅读 Polaris AI DataInsight 文档以获取您的 API 密钥。
用法
复制
向 AI 提问
import getpass
import os
os.environ["POLARIS_AI_DATA_INSIGHT_API_KEY"] = getpass.getpass(
"Enter your PolarisAIDataInsight API key: "
)
复制
向 AI 提问
from langchain_polaris_ai_datainsight import PolarisAIDataInsightLoader
loader = PolarisAIDataInsightLoader(
file_path="example_data/polaris_ai_example.docx",
resources_dir="example_data/tmp",
mode="page", # "element", "page", or "single". (default is "single")
)
docs = loader.load() # or loader.lazy_load()
for doc in docs[:3]:
print(" --------- < Page Content > --------- ")
print(doc.page_content)
print(" --------- < Metadata > --------- ")
print(doc.metadata)
print("\n")
复制
向 AI 提问
--------- < Page Content > ---------
2025 Seed Program Application
I. Funding Information by Track
1. Beginning and Advanced Track Comparison Overview
<table><tbody><tr><td>Category</td><td>Beginning Track*</td><td>Advanced Track*</td></tr><tr><td>Funding target</td><td>A university located outside Korea that has a Central Grant Management Department, an existing Korean Studies infrastructure, and plans to establish an education foundation.</td><td>A non-Korean university with a Central Grant Management Department, at least one full-time Korean Studies faculty member, an undergraduate Korean Studies major or department, and commitment to supporting Korean Studies.</td></tr><tr><td>Funding period</td><td>3 years</td><td>5 years<3+2years></td></tr><tr><td>Funding size</td><td>Maximum possible funding depends on the applicant university’s country<br><table><tbody><tr><td>Country Group*</td><td>Maximum Funding**</td></tr><tr><td>A</td><td>Up to KRW 200 million</td></tr><tr><td>B</td><td>Up to KRW 50 million</td></tr></tbody></table></td><td>Maximum possible funding depends on the applicant university’s country<br><table><tbody><tr><td>Country Group*</td><td>Maximum Funding**</td></tr><tr><td>A</td><td>Up to KRW 150 million</td></tr><tr><td>B</td><td>Up to KRW 90 million</td></tr></tbody></table></td></tr><tr><td>Required project content</td><td>· Fund 2 or more scholarship students<br>· Offer 1 or more regular Korean Studies lecture courses (Excluding Korean language courses)<br>· Hold 1 or more workshops per year in which that students may participate</td><td>· Hire 1 or more Korean Studies full-time faculty<br>· Fund 1 or more scholarship student for Korean Studies<br>· Offer 2 or more regular graduate-level Korean Studies lecture courses (Excluding Korean language courses)<br>· Hold 1 or more international Korean Studies conference<br>· Establish and manage a website, blog, or social media relating to the program </td></tr><tr><td>Recommended content</td><td>· Foster talent (education)<br>· Establish a Korean Studies research institute/center<br>· Establish Korean Studies undergraduate department/major & program<br>· Develop Korean Studies textbooks<br>· Hold academic activities</td><td>· Foster talent (education)<br>· Establish a Korean Studies research institute/center<br>· Establish Korean Studies M.A/Ph.D. department/major & program<br>· Develop Korean Studies textbooks<br>· Hold academic activities</td></tr></tbody></table>
<img id="di.image.im12" data-category="image"/>
2 / 3
--------- < Metadata > ---------
{'di.text.he2te0': {'id': 'di.text.he2te0', 'type': 'text'}, 'di.text.te0': {'id': 'di.text.te0', 'type': 'text'}, 'di.text.te2': {'id': 'di.text.te2', 'type': 'text'}, 'di.table.ta9': {'id': 'di.table.ta9', 'type': 'table'}, 'di.image.im12': {'id': 'di.image.im12', 'type': 'image', 'src': '/home/jenkins_agent/Project/langchain/docs/docs/integrations/document_loaders/example_data/tmp/tmpaynkptxx/polaris_ai_example.docx_image12.png'}, 'di.text.fo3te0': {'id': 'di.text.fo3te0', 'type': 'text'}}
--------- < Page Content > ---------
2025 Seed Program Application
II. Review and Selection
1. Review Process
<img id="di.image.im13" data-category="image"/>
Review of whether the basic requirements for application have been met
Review of the Project Proposal
Admistered by the Expert Review Team
Final review and decision
Admistered by the Comprehensive Review Committee
1. Preliminary Review
2. Content Review (80 pts)
3. Comprehensive Review (20 pts)
2. Review Stages and Content
Stage 1: Preliminary Review
Conducted by Main Department
● Verifies document submission, eligibility, and overlapping support.
● Applications missing required documents, signatures, or failing to meet eligibility do not proceed.
● Applications with Indirect Expenses over 10% of Direct Expenses (including Labor Expenses) are rejected.
Stage 2: Content Review
Conducted by Expert Review Team
● Online review: Points given individually
● Panel review: Points determined by consensus
● Assesses leadership potential, capacity, and project plans.
● Items and scores assigned for evaluation.
<table><tbody><tr><td>Areas</td><td>Items (Points)</td><td>Content</td></tr></tbody></table>
2 / 3
--------- < Metadata > ---------
{'di.text.he2te0': {'id': 'di.text.he2te0', 'type': 'text'}, 'di.text.te10': {'id': 'di.text.te10', 'type': 'text'}, 'di.text.te12': {'id': 'di.text.te12', 'type': 'text'}, 'di.image.im13': {'id': 'di.image.im13', 'type': 'image', 'src': '/home/jenkins_agent/Project/langchain/docs/docs/integrations/document_loaders/example_data/tmp/tmpaynkptxx/polaris_ai_example.docx_image13.png'}, 'di.text.sh15': {'id': 'di.text.sh15', 'type': 'text'}, 'di.text.sh16': {'id': 'di.text.sh16', 'type': 'text'}, 'di.text.sh16te0': {'id': 'di.text.sh16te0', 'type': 'text'}, 'di.text.sh17': {'id': 'di.text.sh17', 'type': 'text'}, 'di.text.sh18': {'id': 'di.text.sh18', 'type': 'text'}, 'di.text.sh19': {'id': 'di.text.sh19', 'type': 'text'}, 'di.text.sh19te0': {'id': 'di.text.sh19te0', 'type': 'text'}, 'di.text.sh19te1': {'id': 'di.text.sh19te1', 'type': 'text'}, 'di.text.sh20': {'id': 'di.text.sh20', 'type': 'text'}, 'di.text.sh21': {'id': 'di.text.sh21', 'type': 'text'}, 'di.text.sh22': {'id': 'di.text.sh22', 'type': 'text'}, 'di.text.sh22te0': {'id': 'di.text.sh22te0', 'type': 'text'}, 'di.text.sh22te1': {'id': 'di.text.sh22te1', 'type': 'text'}, 'di.text.sh23': {'id': 'di.text.sh23', 'type': 'text'}, 'di.text.sh23te0': {'id': 'di.text.sh23te0', 'type': 'text'}, 'di.text.sh24': {'id': 'di.text.sh24', 'type': 'text'}, 'di.text.sh24te0': {'id': 'di.text.sh24te0', 'type': 'text'}, 'di.text.sh25': {'id': 'di.text.sh25', 'type': 'text'}, 'di.text.sh25te0': {'id': 'di.text.sh25te0', 'type': 'text'}, 'di.text.te15': {'id': 'di.text.te15', 'type': 'text'}, 'di.text.te16': {'id': 'di.text.te16', 'type': 'text'}, 'di.text.te17': {'id': 'di.text.te17', 'type': 'text'}, 'di.text.te18': {'id': 'di.text.te18', 'type': 'text'}, 'di.text.te19': {'id': 'di.text.te19', 'type': 'text'}, 'di.text.te20': {'id': 'di.text.te20', 'type': 'text'}, 'di.text.te21': {'id': 'di.text.te21', 'type': 'text'}, 'di.text.te22': {'id': 'di.text.te22', 'type': 'text'}, 'di.text.te23': {'id': 'di.text.te23', 'type': 'text'}, 'di.text.te24': {'id': 'di.text.te24', 'type': 'text'}, 'di.text.te25': {'id': 'di.text.te25', 'type': 'text'}, 'di.text.te26': {'id': 'di.text.te26', 'type': 'text'}, 'di.table.ta26': {'id': 'di.table.ta26', 'type': 'table'}, 'di.text.fo3te0': {'id': 'di.text.fo3te0', 'type': 'text'}}
--------- < Page Content > ---------
2025 Seed Program Application
<table><tbody><tr><td rowspan="3">Evaluation of the Basis for the Project (40)</td><td>Potential to lead Korean Studies (20)</td><td>- Assess whether the university has a distinguished reputation in terms of history and academic disciplines.<br>- Evaluate the strength of the network between the Project Director and local researchers.</td></tr><tr><td>Performance capacity (20)<br>Eligibility criteria (10)</td><td>- Determine if the project director possesses the skills and commitment to execute the project (e.g., Korean language proficiency, influence within the institution, management skills).<br>- Review the achievements of collaborative researchers in Korean Studies.<br>- Confirm whether personnel (Beginning/Advanced) or coursework (Advanced) meet eligibility criteria.</td></tr><tr><td>University support (10)</td><td>- Measure the institution's willingness to support Korean Studies (financial, spatial, and human resources, appropriate indirect expense ratio).<br>- Assess the competency of the Central Grant Management Department.</td></tr><tr><td rowspan="2">Evaluation of the Project Content (40)</td><td>Project plans (30)</td><td>- Ensure that the project objectives are realistic and well-defined.<br>- Verify that the plan aligns with local conditions.<br>- Review the suitability of the Project Team’s structure.<br>- Assess whether the budget plan reflects local price levels.</td></tr></tbody></table>
2 / 3
--------- < Metadata > ---------
{'di.text.he2te0': {'id': 'di.text.he2te0', 'type': 'text'}, 'di.table.ta29': {'id': 'di.table.ta29', 'type': 'table'}, 'di.text.fo3te0': {'id': 'di.text.fo3te0', 'type': 'text'}}
以编程方式连接这些文档到 Claude、VSCode 等,通过 MCP 获取实时答案。