小编Ran*_*icx的帖子

createCombiner,mergeValue,mergeCombiner如何在Spark中使用CombineByKey(使用Scala)

我试图了解每个步骤的combineByKeys工作原理.

有人可以帮我理解下面的RDD吗?

val rdd = sc.parallelize(List(
  ("A", 3), ("A", 9), ("A", 12), ("A", 0), ("A", 5),("B", 4), 
  ("B", 10), ("B", 11), ("B", 20), ("B", 25),("C", 32), ("C", 91),
   ("C", 122), ("C", 3), ("C", 55)), 2)

rdd.combineByKey(
    (x:Int) => (x, 1),
    (acc:(Int, Int), x) => (acc._1 + x, acc._2 + 1),
    (acc1:(Int, Int), acc2:(Int, Int)) => (acc1._1 + acc2._1, acc1._2 + acc2._2))
Run Code Online (Sandbox Code Playgroud)

apache-spark

19
推荐指数
1
解决办法
9653
查看次数

标签 统计

apache-spark ×1