如何在Spark中打印特定RDD分区的元素?

Arn*_*nav 9 scala apache-spark rdd

如何单独打印特定分区的元素,比如说第5个?

val distData = sc.parallelize(1 to 50, 10)
Run Code Online (Sandbox Code Playgroud)

Fab*_*oni 9

使用Spark/Scala:

val data = 1 to 50
val distData = sc.parallelize(data,10)
distData.mapPartitionsWithIndex( (index: Int, it: Iterator[Int]) =>it.toList.map(x => if (index ==5) {println(x)}).iterator).collect
Run Code Online (Sandbox Code Playgroud)

生产:

26
27
28
29
30
Run Code Online (Sandbox Code Playgroud)