小编Flo*_*ius的帖子

写入 delta 湖时使用分区（带 partitionBy）无效

当我最初编写一个 delta 湖时，使用或不使用分区（使用 partitionBy）并没有任何区别。

在写入之前在同一列上使用重新分区，只会更改镶木地板文件的数量。使列显式分区为“不可为空”不会改变效果。

版本：

Spark 2.4（实际上是 2.4.0.0-mapr-620）
斯卡拉 2.11.12
Delta Lake 0.5.0 (io.delta:delta-core_2.11:jar:0.5.0)

import org.apache.spark.sql.Row
import org.apache.spark.sql.types._
import org.apache.spark.sql.functions._

val tmp = spark.createDataFrame(
    spark.sparkContext.parallelize((1 to 10).map(n => Row(n, n % 3))), 
    StructType(Seq(StructField("CONTENT", IntegerType), StructField("PARTITION", IntegerType))))

/* 
tmp.show
+-------+---------+
|CONTENT|PARTITION|
+-------+---------+
|      1|        1|
|      2|        2|
|      3|        0|
|      4|        1|
|      5|        2|
|      6|        0|
|      7|        1|
|      8|        2|
|      9|        0|
|     10|        1|
+-------+---------+
tmp.printSchema
root
 |-- CONTENT: integer (nullable = …

Run Code Online (Sandbox Code Playgroud)

partitioning mapr apache-spark apache-spark-sql delta-lake

Flo*_*ius

2020 01-15

5
推荐指数

1
解决办法

960
查看次数