我是Spark和Scala的新手.我对reduceByKey函数在Spark中的工作方式感到困惑.假设我们有以下代码:
val lines = sc.textFile("data.txt")
val pairs = lines.map(s => (s, 1))
val counts = pairs.reduceByKey((a, b) => a + b)
Run Code Online (Sandbox Code Playgroud)
map函数是明确的:s是键,它指向行,data.txt而1是值.
但是,我没有得到reduceByKey如何在内部工作?"a"指向钥匙吗?或者,"a"指向"s"吗?那么什么代表a + b?它们是如何填满的?
我试图了解每个步骤的combineByKeys工作原理.
有人可以帮我理解下面的RDD吗?
val rdd = sc.parallelize(List(
("A", 3), ("A", 9), ("A", 12), ("A", 0), ("A", 5),("B", 4),
("B", 10), ("B", 11), ("B", 20), ("B", 25),("C", 32), ("C", 91),
("C", 122), ("C", 3), ("C", 55)), 2)
rdd.combineByKey(
(x:Int) => (x, 1),
(acc:(Int, Int), x) => (acc._1 + x, acc._2 + 1),
(acc1:(Int, Int), acc2:(Int, Int)) => (acc1._1 + acc2._1, acc1._2 + acc2._2))
Run Code Online (Sandbox Code Playgroud)