Google BigQuery 是一个无服务器、经济高效的企业数据仓库,可跨云运行并随数据扩展。加载一个BigQuery是Google Cloud Platform的一部分。
BigQuery 查询,每行一个文档。
复制
向 AI 提问
pip install -qU langchain-google-community[bigquery]
复制
向 AI 提问
from langchain_google_community import BigQueryLoader
复制
向 AI 提问
BASE_QUERY = """
SELECT
id,
dna_sequence,
organism
FROM (
SELECT
ARRAY (
SELECT
AS STRUCT 1 AS id, "ATTCGA" AS dna_sequence, "Lokiarchaeum sp. (strain GC14_75)." AS organism
UNION ALL
SELECT
AS STRUCT 2 AS id, "AGGCGA" AS dna_sequence, "Heimdallarchaeota archaeon (strain LC_2)." AS organism
UNION ALL
SELECT
AS STRUCT 3 AS id, "TCCGGA" AS dna_sequence, "Acidianus hospitalis (strain W1)." AS organism) AS new_array),
UNNEST(new_array)
"""
基本用法
复制
向 AI 提问
loader = BigQueryLoader(BASE_QUERY)
data = loader.load()
复制
向 AI 提问
print(data)
复制
向 AI 提问
[Document(page_content='id: 1\ndna_sequence: ATTCGA\norganism: Lokiarchaeum sp. (strain GC14_75).', lookup_str='', metadata={}, lookup_index=0), Document(page_content='id: 2\ndna_sequence: AGGCGA\norganism: Heimdallarchaeota archaeon (strain LC_2).', lookup_str='', metadata={}, lookup_index=0), Document(page_content='id: 3\ndna_sequence: TCCGGA\norganism: Acidianus hospitalis (strain W1).', lookup_str='', metadata={}, lookup_index=0)]
指定哪些列是内容列和元数据列
复制
向 AI 提问
loader = BigQueryLoader(
BASE_QUERY,
page_content_columns=["dna_sequence", "organism"],
metadata_columns=["id"],
)
data = loader.load()
复制
向 AI 提问
print(data)
复制
向 AI 提问
[Document(page_content='dna_sequence: ATTCGA\norganism: Lokiarchaeum sp. (strain GC14_75).', lookup_str='', metadata={'id': 1}, lookup_index=0), Document(page_content='dna_sequence: AGGCGA\norganism: Heimdallarchaeota archaeon (strain LC_2).', lookup_str='', metadata={'id': 2}, lookup_index=0), Document(page_content='dna_sequence: TCCGGA\norganism: Acidianus hospitalis (strain W1).', lookup_str='', metadata={'id': 3}, lookup_index=0)]
向元数据中添加来源
复制
向 AI 提问
# Note that the `id` column is being returned twice, with one instance aliased as `source`
ALIASED_QUERY = """
SELECT
id,
dna_sequence,
organism,
id as source
FROM (
SELECT
ARRAY (
SELECT
AS STRUCT 1 AS id, "ATTCGA" AS dna_sequence, "Lokiarchaeum sp. (strain GC14_75)." AS organism
UNION ALL
SELECT
AS STRUCT 2 AS id, "AGGCGA" AS dna_sequence, "Heimdallarchaeota archaeon (strain LC_2)." AS organism
UNION ALL
SELECT
AS STRUCT 3 AS id, "TCCGGA" AS dna_sequence, "Acidianus hospitalis (strain W1)." AS organism) AS new_array),
UNNEST(new_array)
"""
复制
向 AI 提问
loader = BigQueryLoader(ALIASED_QUERY, metadata_columns=["source"])
data = loader.load()
复制
向 AI 提问
print(data)
复制
向 AI 提问
[Document(page_content='id: 1\ndna_sequence: ATTCGA\norganism: Lokiarchaeum sp. (strain GC14_75).\nsource: 1', lookup_str='', metadata={'source': 1}, lookup_index=0), Document(page_content='id: 2\ndna_sequence: AGGCGA\norganism: Heimdallarchaeota archaeon (strain LC_2).\nsource: 2', lookup_str='', metadata={'source': 2}, lookup_index=0), Document(page_content='id: 3\ndna_sequence: TCCGGA\norganism: Acidianus hospitalis (strain W1).\nsource: 3', lookup_str='', metadata={'source': 3}, lookup_index=0)]
以编程方式连接这些文档到 Claude、VSCode 等,通过 MCP 获取实时答案。