小编Mor*_*van的帖子

Apache Spark can't read parquet folder that is being written with streaming job

When I try to read parquet folder, that is currently being written with another spark streaming job, using an option "mergeSchema":"true", I get an Error:

java.io.IOException: Could not read footer for file

Run Code Online (Sandbox Code Playgroud)

java.io.IOException: Could not read footer for file

Run Code Online (Sandbox Code Playgroud)

Without schema merging I can read the folder nicely but is it possible to read such a folder with schema merging regardless of possible side jobs updating it?

Full exception:

java.io.IOException: Could not read footer for file: FileStatus{path=hdfs://path.parquet/part-00000-20199ef6-4ff8-4ee0-93cc-79d47d2da37d-c000.snappy.parquet; isDirectory=false; length=0; replication=0; …

Run Code Online (Sandbox Code Playgroud)

apache-spark parquet apache-spark-sql

Mor*_*van

2019 08-13

3
推荐指数

1
解决办法

265
查看次数

标签统计

apache-spark ×1

apache-spark-sql ×1

parquet ×1

Apache Spark can't read parquet folder that is being written with streaming job

标签 统计

小编Mor_van的帖子

标签统计