小编Joh*_*agg的帖子

读取镶木地板文件时,无法解析使用int和double的合并架构

我有两个镶木地板文件,一个包含整数字段myField,另一个包含双字段myField.尝试同时读取这两个文件时

val basePath = "/path/to/file/"
val fileWithInt = basePath + "intFile.snappy.parquet"
val fileWithDouble = basePath + "doubleFile.snappy.parquet"
val result = spark.sqlContext.read.option("mergeSchema", true).option("basePath", basePath).parquet(Seq(fileWithInt, fileWithDouble): _*).select("myField")
Run Code Online (Sandbox Code Playgroud)

我收到以下错误

Caused by: org.apache.spark.SparkException: Failed to merge fields 'myField' and 'myField'. Failed to merge incompatible data types IntegerType and DoubleType
Run Code Online (Sandbox Code Playgroud)

传递显式模式时

val schema = StructType(Seq(new StructField("myField", IntegerType)))
val result = spark.sqlContext.read.schema(schema).option("mergeSchema", true).option("basePath", basePath).parquet(Seq(fileWithInt, fileWithDouble): _*).select("myField")
Run Code Online (Sandbox Code Playgroud)

它失败了以下内容

java.lang.UnsupportedOperationException: org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainDoubleDictionary
    at org.apache.parquet.column.Dictionary.decodeToInt(Dictionary.java:48)
Run Code Online (Sandbox Code Playgroud)

当铸造一个双

val schema = StructType(Seq(new StructField("myField", DoubleType)))
Run Code Online (Sandbox Code Playgroud)

我明白了

java.lang.UnsupportedOperationException: org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainIntegerDictionary
    at …
Run Code Online (Sandbox Code Playgroud)

scala apache-spark apache-spark-sql

8
推荐指数
1
解决办法
1180
查看次数

标签 统计

apache-spark ×1

apache-spark-sql ×1

scala ×1