按火花对RDD中的值排序

Vij*_*uri 18 scala apache-spark

我有一个火花对RDD(键,计数)如下

Array[(String, Int)] = Array((a,1), (b,2), (c,1), (d,3))
Run Code Online (Sandbox Code Playgroud)

使用spark scala API如何获取按值排序的新对RDD?

要求的结果: Array((d,3), (b,2), (a,1), (c,1))

Gáb*_*kos 40

这应该工作:

//Assuming the pair's second type has an Ordering, which is the case for Int
rdd.sortBy(_._2) // same as rdd.sortBy(pair => pair._2)
Run Code Online (Sandbox Code Playgroud)

(虽然你可能也想在有关系的时候把钥匙交给账户.)

  • 对于降序,请使用:`rdd.sortBy(_._2, false)` 参考:/sf/ask/2934317851/订单使用价值 (4认同)
  • 对于那些来这篇文章寻找PySpark解决方案的人:`rdd.sortBy(lambda pair:pair [1])` (2认同)

Nag*_*tal 8

按键和值按升序和降序排序

val textfile = sc.textFile("file:///home/hdfs/input.txt")
val words = textfile.flatMap(line => line.split(" "))
//Sort by value in descending order. For ascending order remove 'false' argument from sortBy
words.map( word => (word,1)).reduceByKey((a,b) => a+b).sortBy(_._2,false)
//for ascending order by value
words.map( word => (word,1)).reduceByKey((a,b) => a+b).sortBy(_._2)

//Sort by key in ascending order
words.map( word => (word,1)).reduceByKey((a,b) => a+b).sortByKey
//Sort by key in descending order
words.map( word => (word,1)).reduceByKey((a,b) => a+b).sortByKey(false)
Run Code Online (Sandbox Code Playgroud)

这可以通过交换键和值后应用sortByKey以另一种方式完成

//Sort By value by swapping key and value and then using sortByKey
val sortbyvalue = words.map( word => (word,1)).reduceByKey((a,b) => a+b)
val descendingSortByvalue = sortbyvalue.map(x => (x._2,x._1)).sortByKey(false)
descendingSortByvalue.toDF.show
descendingSortByvalue.foreach {n => {
val word=  n._1
val count = n._2
println(s"$word:$count")}}
Run Code Online (Sandbox Code Playgroud)