Apache Spark:明显不起作用？

Question

Apache Spark:明显不起作用？

这是我的代码示例:

 case class Person(name:String,tel:String){
        def equals(that:Person):Boolean = that.name == this.name && this.tel == that.tel}

 val persons = Array(Person("peter","139"),Person("peter","139"),Person("john","111"))
 sc.parallelize(persons).distinct.collect

Run Code Online (Sandbox Code Playgroud)

它回来了

 res34: Array[Person] = Array(Person(john,111), Person(peter,139), Person(peter,139))

Run Code Online (Sandbox Code Playgroud)

为什么distinct不起作用？我希望结果为Person("john",111),Person("peter",139)

Answer 1

aar*_*man 0

正如其他人指出的那样，这是 Spark 1.0.0 中的一个错误。我关于它来自哪里的理论是，如果你看看 1.0.0 到 9.0 的差异，你会发现

-  def repartition(numPartitions: Int): RDD[T] = {
+  def repartition(numPartitions: Int)(implicit ord: Ordering[T] = null): RDD[T] = {

Run Code Online (Sandbox Code Playgroud)

如果你跑

case class A(i:Int)
implicitly[Ordering[A]]

Run Code Online (Sandbox Code Playgroud)

你得到一个错误

<console>:13: error: No implicit Ordering defined for A.
              implicitly[Ordering[A]]

Run Code Online (Sandbox Code Playgroud)

所以我认为解决方法是为案例类定义隐式排序，不幸的是我不是 Scala 专家，但这个答案似乎做得正确

归档时间：	11 年，7 月前
查看次数：	3417 次
最近记录：	9 年，10 月前