array_unique with SORT_NUMBERIC behaviour

Art*_*ida 5 php arrays array-unique

I've stumbled upon something weird and I don't understand why it works that way.

I have an array of numbers, they are all unique:

$array = [
    98602142989816970,
    98602142989816971,
    98602142989816980,
    98602142989816981,
    98602142989816982,
    98602142989816983,
    98602142989820095,
    98602142989820096,
    98602142989822060,
    98602142989822061,
];
var_dump($array);
Run Code Online (Sandbox Code Playgroud)
array(10) {
  [0]=>
  int(98602142989816970)
  [1]=>
  int(98602142989816971)
  [2]=>
  int(98602142989816980)
  [3]=>
  int(98602142989816981)
  [4]=>
  int(98602142989816982)
  [5]=>
  int(98602142989816983)
  [6]=>
  int(98602142989820095)
  [7]=>
  int(98602142989820096)
  [8]=>
  int(98602142989822060)
  [9]=>
  int(98602142989822061)
}
Run Code Online (Sandbox Code Playgroud)

If I do print_r(array_unique($array)); everything is fine, I get:

Array
(
    [0] => 98602142989816970
    [1] => 98602142989816971
    [2] => 98602142989816980
    [3] => 98602142989816981
    [4] => 98602142989816982
    [5] => 98602142989816983
    [6] => 98602142989820095
    [7] => 98602142989820096
    [8] => 98602142989822060
    [9] => 98602142989822061
)
Run Code Online (Sandbox Code Playgroud)

But If I add SORT_NUMERIC flag print_r(array_unique($array, SORT_NUMERIC)); I get:

Array
(
    [0] => 98602142989816970
    [6] => 98602142989820095
    [8] => 98602142989822060
)
Run Code Online (Sandbox Code Playgroud)

Why only those 3 numbers are returned?

update: I'm on 64-bit system.

对于sort函数,我手动混洗了一些值,因为在原始数组中它们已经排序。

如果我这样做,sort($array);则响应符合预期:

Array
(
    [0] => 98602142989816970
    [1] => 98602142989816971
    [2] => 98602142989816980
    [3] => 98602142989816981
    [4] => 98602142989816982
    [5] => 98602142989816983
    [6] => 98602142989820095
    [7] => 98602142989820096
    [8] => 98602142989822060
    [9] => 98602142989822061
)
Run Code Online (Sandbox Code Playgroud)

但是使用sort($array, SORT_NUMERIC);,它们的排序不正确:

Array
(
    [0] => 98602142989816970
    [1] => 98602142989816982
    [2] => 98602142989816983
    [3] => 98602142989816980
    [4] => 98602142989816981
    [5] => 98602142989816971
    [6] => 98602142989820095
    [7] => 98602142989820096
    [8] => 98602142989822060
    [9] => 98602142989822061
)
Run Code Online (Sandbox Code Playgroud)

iai*_*inn 5

您在该规模下遇到了精度和浮点算术问题。有可在负载的详细信息是浮点运算坏了吗?如果你有兴趣,但我认为这不算是重复。

取前两个数字:

php > var_dump((float) 98602142989816970 === (float) 98602142989816971);
bool(true)

php > var_dump((float) 98602142989816970, (float) 98602142989816971);
float(9.8602142989817E+16)
float(9.8602142989817E+16)
Run Code Online (Sandbox Code Playgroud)

在内部,这就是 PHP 使用SORT_NUMERIC、在numeric_compare_function.

sort遇到同样的问题,请参阅https://3v4l.org/02UUB(显然没有从数组中删除值,因为这只会发生在array_unique- 它们只是没有正确排序)

简而言之,对于这种大小的数字(或者特别是相对于它们的规模非常接近的数字),SORT_NUMERIC是不可靠的。如果可以,坚持将它们作为字符串进行比较。