小编Mor*_*van的帖子

Apache Spark can't read parquet folder that is being written with streaming job

When I try to read parquet folder, that is currently being written with another spark streaming job, using an option "mergeSchema":"true", I get an Error:

java.io.IOException: Could not read footer for file
Run Code Online (Sandbox Code Playgroud)
java.io.IOException: Could not read footer for file
Run Code Online (Sandbox Code Playgroud)

Without schema merging I can read the folder nicely but is it possible to read such a folder with schema merging regardless of possible side jobs updating it?

Full exception:

java.io.IOException: Could not read footer for file: FileStatus{path=hdfs://path.parquet/part-00000-20199ef6-4ff8-4ee0-93cc-79d47d2da37d-c000.snappy.parquet; isDirectory=false; length=0; replication=0; …
Run Code Online (Sandbox Code Playgroud)

apache-spark parquet apache-spark-sql

3
推荐指数
1
解决办法
265
查看次数

标签 统计

apache-spark ×1

apache-spark-sql ×1

parquet ×1