小编Sac*_*ain的帖子

如何通过 pyarrow 使用用户定义的模式编写 Parquet

当我执行以下代码时 - 出现以下错误ValueError: Table schema does not match schema used to create file

import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq


fields = [
    ('one', pa.int64()),
    ('two', pa.string(), False),
    ('three', pa.bool_())
]
schema = pa.schema(fields)

schema = schema.remove_metadata()
df = pd.DataFrame(
    {
        'one': [2, 2, 2],
        'two': ['foo', 'bar', 'baz'],
        'three': [True, False, True]
    }
)

df['two'] = df['two'].astype(str)

table = pa.Table.from_pandas(df, schema, preserve_index=False).replace_schema_metadata()
writer = pq.ParquetWriter('parquest_user_defined_schema.parquet', schema=schema)
writer.write_table(table)
Run Code Online (Sandbox Code Playgroud)

python-3.x pyarrow

6
推荐指数
1
解决办法
9773
查看次数

标签 统计

pyarrow ×1

python-3.x ×1