pat*_*l94 1 automation snowflake-cloud-data-platform
是否有任何工具或任何方法可以根据任何文本文件自动创建表格?
我有 100 多个 csv 文件,每个文件都有不同数量的列。如果首先在雪花中手动创建表定义,然后加载数据,那将是很多工作。我正在寻找一种无需创建表即可加载数据的特定方法。
如果有人知道如何解决这个问题,请告诉我。谢谢!
小智 5
Spark和Pandas等数据处理框架具有可以解析 CSV 标题行并使用推断数据类型(不仅仅是字符串)形成模式的读取器。您可以利用它来创建新表。
以下示例作为说明提供:
import sqlalchemy as sql
import pandas as pd
import os
# Setup an SQL Alchemy Engine object
# This will provide a connection pool for Pandas to use later
engine = sql.create_engine(
'snowflake://{u}:{p}@{a}/{d}/{s}?warehouse={w}&role={r}'.format(
u='USERNAME',
p='PASSWORD',
a='account.region',
r='ROLE_NAME',
d='DATABASE',
s='SCHEMA',
w='WAREHOUSE_NAME',
)
)
# List of (n) input CSV file paths
csv_input_filepaths = [
'/tmp/test1.csv',
'/tmp/test2.csv',
'/tmp/test3.csv',
]
try:
# Process each path
for path in csv_input_filepaths:
# Use filename component of path as tablename
# '/tmp/test1.csv' creates table named 'test1', etc.
filename, _ext = os.path.splitext(os.path.basename(path))
# Default CSV reading options in Pandas sniff and infer headers
# It will auto-populate schema and types based on data
data = pd.read_csv(path)
# Stores into Snowflake (will create the table name if it does not exist)
# Default args will attempt to create an index, so we disable that
data.to_sql(filename, engine, index = False)
finally:
# Tear down all connections gracefully pre-exit
engine.dispose()
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
2233 次 |
| 最近记录: |