包含NaN的集合的最小值/最大值(处理订单中的不可比性)

blu*_*e10 11 scala max min

由于以下行为,我刚遇到一个令人讨厌的错误:

scala> List(1.0, 2.0, 3.0, Double.NaN).min
res1: Double = NaN

scala> List(1.0, 2.0, 3.0, Double.NaN).max
res2: Double = NaN
Run Code Online (Sandbox Code Playgroud)

我理解,对于成对比较,它有时可能是优选的max(NaN, 0) = NaN,这可能是java.lang.Double.compare遵循这个约定的原因(似乎有一个IEEE标准).然而,对于一个集合,我真的认为这是一个奇怪的惯例.以上所有收集确实包含有效数字; 这些数字有明确的最大值和最小值.在我看来,收集的最大数量不是数字的概念是矛盾的,因为NaN不是数字,所以它不能是集合的最大或最小"数字" - 除非没有有效数字; 在这种情况下,最大的"不是数字"是完全合理的.语义上minmax函数退化以检查集合是否包含NaN.由于有更合适的方法来检查NaN的存在(例如collection.find(_.isNaN)),在集合上保持语义上有意义的最小/最大值将是很好的.

所以我的问题是:获得忽略NaN存在的行为的最佳方法是什么?我看到两种可能性:

  1. 在调用min/max之前过滤NaN.由于这需要在所有地方明确处理问题,并可能导致性能损失,我宁愿更容易.

  2. 有一种NaN忽略排序会很好,可以在必要时用作隐式排序.我尝试了以下方法:

      object NanAwareOrdering extends Ordering[Double] {
        def compare(x: Double, y: Double) = {
          if (x.isNaN()) {
            +1 // without checking x, return y < x
          } else if (y.isNaN()) {
            -1 // without checking y, return x < y
          } else {
            java.lang.Double.compare(x, y)
          }
        }
      }
    
    Run Code Online (Sandbox Code Playgroud)

    然而,这种方法似乎取决于我是否有兴趣找到最小值或最大值,即:

     scala> List(1.0, 2.0, 3.0, Double.NaN).min(NanAwareOrdering)
     res7: Double = 1.0
    
     scala> List(1.0, 2.0, 3.0, Double.NaN).max(NanAwareOrdering)
     res8: Double = NaN
    
    Run Code Online (Sandbox Code Playgroud)

    这意味着我必须有两个NanAwareOrdering,这取决于我是否需要最小值或最大值,这将禁止拥有implicit val.因此我的问题是:我如何定义一个处理两种情况的排序?

更新:

为了完整起见:在分析问题的过程中,我意识到前提"退化为检查NaN"实际上是错误的.事实上,我认为它更难看:

scala> List(1.0, Double.NaN).min
res1: Double = NaN

scala> List(Double.NaN, 1.0).min
res2: Double = 1.0
Run Code Online (Sandbox Code Playgroud)

blu*_*e10 5

免责声明:我会在问题中添加自己的答案,以防万一其他人仍然对此事的更多细节感兴趣.

一些理论......

我觉得这个问题比我想象的要复杂得多.正如阿列克谢·罗曼诺夫已经指出的那样,无法比较的概念要求最大/最小函数采取部分顺序.不幸的是,阿列克谢也是在点右键,基于偏序没有意义上的常规最大值/最小值功能:认为其中部分排序只定义了某些群体内部关系的情况下,但群体本身是由完全独立彼此(例如,元素{a,b,c,d}只有两个关系a <b和c <d;我们将有两个max/min).在这方面,人们甚至可能会争辩说,正式的最大值/最小值应该总是返回两个值,NaN 相应的有效最小值/最大值,因为NaN本身也是其自身关系组中的极值.

因此,由于部分订单过于笼统/复杂,最小/最大功能需要一个Ordering.不幸的是,总订单不允许无法比较的概念.回顾总订单的三个定义属性,很明显"忽略NaN"在形式上是不可能的:

  1. 如果a≤b且b≤a则a = b(反对称)
  2. 如果a≤b且b≤c则a≤c(传递性)
  3. a≤b或b≤a(总数)

......并且练习......

因此,当试图提出一个Ordering实现我们所希望的最小/最大行为时,很明显我们必须违反某些事情(并承担后果).执行min/ max/ minBy/ maxByTraversableOnce随后的图案(min):

reduceLeft((x, y) => if (cmp.lteq(x, y)) x else y)
Run Code Online (Sandbox Code Playgroud)

gteqmax变种.这让我想到了"左偏"的比较,即:

x   <comparison_operator> NaN    is always true to keep x in the reduction
NaN <comparison_operator> x      is always false to inject x into the reduction
Run Code Online (Sandbox Code Playgroud)

这种"左偏"排序的结果实现如下:

object BiasedOrdering extends Ordering[Double] {
  def compare(x: Double, y: Double) = java.lang.Double.compare(x, y) // this is inconsistent, but the same goes for Double.Ordering

  override def lteq(x: Double, y: Double): Boolean  = if (x.isNaN() && !y.isNaN) false else if (!x.isNaN() && y.isNaN) true else if (x.isNaN() && y.isNaN) true  else compare(x, y) <= 0
  override def gteq(x: Double, y: Double): Boolean  = if (x.isNaN() && !y.isNaN) false else if (!x.isNaN() && y.isNaN) true else if (x.isNaN() && y.isNaN) true  else compare(x, y) >= 0
  override def lt(x: Double, y: Double): Boolean    = if (x.isNaN() && !y.isNaN) false else if (!x.isNaN() && y.isNaN) true else if (x.isNaN() && y.isNaN) false else compare(x, y) < 0
  override def gt(x: Double, y: Double): Boolean    = if (x.isNaN() && !y.isNaN) false else if (!x.isNaN() && y.isNaN) true else if (x.isNaN() && y.isNaN) false else compare(x, y) > 0
  override def equiv(x: Double, y: Double): Boolean = if (x.isNaN() && !y.isNaN) false else if (!x.isNaN() && y.isNaN) true else if (x.isNaN() && y.isNaN) true  else compare(x, y) == 0

}
Run Code Online (Sandbox Code Playgroud)

......分析:

目前我正试图找出:

  • 此订单与默认订单的比较方式,
  • 我们在哪里违反总订单属性,
  • 什么是潜在的问题.

我将此与Scala的默认顺序Ordering.Double和以下顺序进行比较,该顺序直接来自java.lang.Double.compare:

object OrderingDerivedFromCompare extends Ordering[Double] {
  def compare(x: Double, y: Double) = {
    java.lang.Double.compare(x, y)
  }
}
Run Code Online (Sandbox Code Playgroud)

Scala的默认顺序的一个有趣的特性Ordering.Double是,它由语言原有数值比较运算将覆盖所有比较的成员函数(<,<=,==,>=,>)这样的比较结果是相同的,就好像我们将与这些运营商直接比较.以下显示了NaN与三个排序的有效数字之间的所有可能关系:

Ordering.Double             0.0 >  NaN = false
Ordering.Double             0.0 >= NaN = false
Ordering.Double             0.0 == NaN = false
Ordering.Double             0.0 <= NaN = false
Ordering.Double             0.0 <  NaN = false
OrderingDerivedFromCompare  0.0 >  NaN = false
OrderingDerivedFromCompare  0.0 >= NaN = false
OrderingDerivedFromCompare  0.0 == NaN = false
OrderingDerivedFromCompare  0.0 <= NaN = true
OrderingDerivedFromCompare  0.0 <  NaN = true
BiasedOrdering              0.0 >  NaN = true
BiasedOrdering              0.0 >= NaN = true
BiasedOrdering              0.0 == NaN = true
BiasedOrdering              0.0 <= NaN = true
BiasedOrdering              0.0 <  NaN = true

Ordering.Double             NaN >  0.0 = false
Ordering.Double             NaN >= 0.0 = false
Ordering.Double             NaN == 0.0 = false
Ordering.Double             NaN <= 0.0 = false
Ordering.Double             NaN <  0.0 = false
OrderingDerivedFromCompare  NaN >  0.0 = true
OrderingDerivedFromCompare  NaN >= 0.0 = true
OrderingDerivedFromCompare  NaN == 0.0 = false
OrderingDerivedFromCompare  NaN <= 0.0 = false
OrderingDerivedFromCompare  NaN <  0.0 = false
BiasedOrdering              NaN >  0.0 = false
BiasedOrdering              NaN >= 0.0 = false
BiasedOrdering              NaN == 0.0 = false
BiasedOrdering              NaN <= 0.0 = false
BiasedOrdering              NaN <  0.0 = false

Ordering.Double             NaN >  NaN = false
Ordering.Double             NaN >= NaN = false
Ordering.Double             NaN == NaN = false
Ordering.Double             NaN <= NaN = false
Ordering.Double             NaN <  NaN = false
OrderingDerivedFromCompare  NaN >  NaN = false
OrderingDerivedFromCompare  NaN >= NaN = true
OrderingDerivedFromCompare  NaN == NaN = true
OrderingDerivedFromCompare  NaN <= NaN = true
OrderingDerivedFromCompare  NaN <  NaN = false
BiasedOrdering              NaN >  NaN = false
BiasedOrdering              NaN >= NaN = true
BiasedOrdering              NaN == NaN = true
BiasedOrdering              NaN <= NaN = true
BiasedOrdering              NaN <  NaN = false
Run Code Online (Sandbox Code Playgroud)

我们可以看到:

  • OrderingDerivedFromCompare满足总订单属性.基于这个结果,背后的推理java.lang.Double.compare变得更加明确:将NaN放在总订单的上端简单地避免了任何矛盾!
  • Scala的默认顺序和偏向顺序违反了许多总体条件.Scala的默认顺序总是返回false,而对于偏差顺序,它取决于位置.由于两者都导致矛盾,因此很难看出哪些可能导致更严重的问题.

现在我们手头的实际问题,最小/最大功能.因为OrderingDerivedFromCompare现在很清楚我们必须获得什么 - NaN只是最大的值,因此无论列表中的元素是如何排列的,都可以将其作为最大值获得:

OrderingDerivedFromCompare  List(1.0, 2.0, 3.0, Double.NaN).min = 1.0
OrderingDerivedFromCompare  List(Double.NaN, 1.0, 2.0, 3.0).min = 1.0
OrderingDerivedFromCompare  List(1.0, 2.0, 3.0, Double.NaN).max = NaN
OrderingDerivedFromCompare  List(Double.NaN, 1.0, 2.0, 3.0).max = NaN
Run Code Online (Sandbox Code Playgroud)

现在来看Scala的默认排序.看到这种情况实际上比我的问题中提到的更复杂,我深感震惊:

Ordering.Double             List(1.0, 2.0, 3.0, Double.NaN).min = NaN
Ordering.Double             List(Double.NaN, 1.0, 2.0, 3.0).min = 1.0
Ordering.Double             List(1.0, 2.0, 3.0, Double.NaN).max = NaN
Ordering.Double             List(Double.NaN, 1.0, 2.0, 3.0).max = 3.0
Run Code Online (Sandbox Code Playgroud)

事实上,元素的顺序变得相关(作为false对每个比较的返回的结果reduceLeft)."左偏置"显然解决了这个问题,从而产生了一致的结果:

BiasedOrdering              List(1.0, 2.0, 3.0, Double.NaN).min = 1.0
BiasedOrdering              List(Double.NaN, 1.0, 2.0, 3.0).min = 1.0
BiasedOrdering              List(1.0, 2.0, 3.0, Double.NaN).max = 3.0
BiasedOrdering              List(Double.NaN, 1.0, 2.0, 3.0).max = 3.0
Run Code Online (Sandbox Code Playgroud)

不幸的是,我仍然无法完全回答所有问题.剩下的一些要点是:

  • 为什么Scala的默认排序按照它的方式定义?目前处理NaNs似乎存在很大缺陷.一个非常危险的细节Ordering.Doublecompare函数实际委托java.lang.Double.compare,而比较成员是基于语言的本机比较实现的.这显然会导致不一致的结果,例如:

    Ordering.Double.compare(0.0, Double.NaN) == -1     // indicating 0.0 < NaN
    Ordering.Double.lt     (0.0, Double.NaN) == false  // contradiction
    
    Run Code Online (Sandbox Code Playgroud)
  • BiasedOrdering除了直接评估任何矛盾的比较之外,还有哪些潜在的缺点?快速检查sorted给出了以下结果,但没有发现任何问题:

    Ordering.Double             List(1.0, 2.0, 3.0, Double.NaN).sorted = List(1.0, 2.0, 3.0, NaN)
    OrderingDerivedFromCompare  List(1.0, 2.0, 3.0, Double.NaN).sorted = List(1.0, 2.0, 3.0, NaN)
    BiasedOrdering              List(1.0, 2.0, 3.0, Double.NaN).sorted = List(1.0, 2.0, 3.0, NaN)
    
    Ordering.Double             List(Double.NaN, 1.0, 2.0, 3.0).sorted = List(1.0, 2.0, 3.0, NaN)
    OrderingDerivedFromCompare  List(Double.NaN, 1.0, 2.0, 3.0).sorted = List(1.0, 2.0, 3.0, NaN)
    BiasedOrdering              List(Double.NaN, 1.0, 2.0, 3.0).sorted = List(1.0, 2.0, 3.0, NaN)
    
    Run Code Online (Sandbox Code Playgroud)

暂时我会选择这种左偏序.但由于问题的性质不允许完美的通用解决方案:小心使用!

更新

在基于Monkjack建议的隐式类的解决方案方面,我非常喜欢以下内容(因为它根本没有混淆(有缺陷的?)总订单,但内部转换为一个干净的完全有序的域):

implicit class MinMaxNanAware(t: TraversableOnce[Double]) {
  def nanAwareMin = t.minBy(x => if (x.isNaN) Double.PositiveInfinity else x)
  def nanAwareMax = t.maxBy(x => if (x.isNaN) Double.NegativeInfinity else x)
}

// and now we can simply use
val goodMin = list.nanAwareMin
Run Code Online (Sandbox Code Playgroud)