krx*_*krx 4 python dataframe parquet dask dask-dataframe
我正在使用Dask将 df 写入Parquet文件:
df.to_parquet(file, compression='snappy', write_metadata_file=False,\
engine='pyarrow', index=None)
Run Code Online (Sandbox Code Playgroud)
我需要在在线镶木地板查看器中显示文件的内容,
显示的列是:
Column1 Column2 Column3 __null_dask_index__
Run Code Online (Sandbox Code Playgroud)
如何删除该__null_dask_index__列?
这里相关的 kwarg 是write_index:
from dask.datasets import timeseries
from pyarrow.parquet import ParquetFile
df = timeseries(end='2000-01-03').reset_index()
for write_index in [True, False]:
df.to_parquet('test.pqt', write_index=write_index)
f = ParquetFile('test.pqt/part.0.parquet')
print(f.schema.names)
# ['__null_dask_index__', 'timestamp', 'id', 'name', 'x', 'y']
# ['timestamp', 'id', 'name', 'x', 'y']
Run Code Online (Sandbox Code Playgroud)