小编use*_*784的帖子

python fastparquet模块可以在压缩的拼花文件中读取吗?

我们的镶木地板文件存储在aws S3存储桶中,并由SNAPPY压缩.我能够使用python fastparquet模块读取未压缩版本的镶木地板文件,但不能读取压缩版本.

这是我用于未压缩的代码

s3 = s3fs.S3FileSystem(key='XESF',    secret='dsfkljsf')
myopen = s3.open
pf = ParquetFile('sample/py_test_snappy/part-r-12423423942834.parquet', open_with=myopen)
df=pf.to_pandas()
Run Code Online (Sandbox Code Playgroud)

这返回没有错误但是当我尝试读取文件的snappy压缩版本时:

pf = ParquetFile('sample/py_test_snappy/part-r-12423423942834.snappy.parquet', open_with=myopen)
Run Code Online (Sandbox Code Playgroud)

我得到了to_pandas()的错误

df=pf.to_pandas()
Run Code Online (Sandbox Code Playgroud)

错误信息

()----> 1 df = pf.to_pandas()中的KeyErrorTraceback(最近一次调用last)

/opt/conda/lib/python3.5/site-packages/fastparquet/api.py in_pandas(self,columns,categories,filters,index)293 for views(item,v)in views.items()} 294 self. read_row_group(rg,columns,categories,infile = f, - > 295 index = index,assign = parts)296 start + = rg.num_rows 297 else:

read_row_group中的/opt/conda/lib/python3.5/site-packages/fastparquet/api.py(self,rg,columns,categories,infile,index,assign)151 core.read_row_group(152 infile,rg,columns,categories ,self.helper,self.cats, - > 153 self.selfmade,index = index,assign = assign)154 if ret:155 return df

read_row_group中的/opt/conda/lib/python3.5/site-packages/fastparquet/core.py(文件,rg,列,类别,schema_helper,cats,selfmade,index,assign)300引发RuntimeError('Going with pre-分配!')301 read_row_group_arrays(文件,rg,列,类别,schema_helper, - > 302只猫,自制,assign = assign)303 304用于猫猫:

read_row_group_arrays中的/opt/conda/lib/python3.5/site-packages/fastparquet/core.py(文件,rg,列,类别,schema_helper,cats,selfmade,assign)289 …

python pandas parquet

6
推荐指数
1
解决办法
5962
查看次数

标签 统计

pandas ×1

parquet ×1

python ×1