通过Comparator <T>进行的Java排序大部分时间都花在比较(Object,Object)上

Bry*_*ead 14 java sorting generics performance comparator

在分析我们的代码库时,我发现了一些奇怪的东西.似乎用类型比较器(例如Comparator<MyClass>)排序总是先调用一个方法Comparator<MyClass>.compare(Object,Object)然后调用该方法Comparator<MyClass>.compare(MyClass,MyClass).此外,绝大部分时间都花在了Comparator<MyClass>.compare(Object,Object).为了进一步探索,我做了一个小测试程序:

public class Sandbox {
    public static void main(String argv[]) {
        for(int j=0; j<100; j++) {
            int n = 10000;
            SortMe[] sortMes = new SortMe[n];
            for (int i=0; i<n; i++) {
                sortMes[i] = new SortMe(Math.random());
            }
            Arrays.sort(sortMes, new SortMeComp());
            System.out.println(Arrays.toString(sortMes));
        }
        for(int j=0; j<100; j++) {
            int n = 10000;
            SortMe[] sortMes = new SortMe[n];
            for (int i=0; i<n; i++) {
                sortMes[i] = new SortMe(Math.random());
            }
            Arrays.sort(sortMes, new SortMeCompTypeless());
            System.out.println(Arrays.toString(sortMes));
        }
    }
}
Run Code Online (Sandbox Code Playgroud)

打字的比较器:

public class SortMeComp implements Comparator<SortMe>{
    public int compare(SortMe one, SortMe two) {
        if(one.getValue()>two.getValue()) {
            return -1;
        } else if (one.getValue()<two.getValue()) {
            return 1;
        } else {
            return 0;
        }
    }
}
Run Code Online (Sandbox Code Playgroud)

我为比较而做的无类型比较器:

public class SortMeCompTypeless implements Comparator{
    public int compare(Object oneObj, Object twoObj) {
        SortMe one = (SortMe) oneObj;
        SortMe two = (SortMe) twoObj;
        if(one.getValue()>two.getValue()) {
            return -1;
        } else if (one.getValue()<two.getValue()) {
            return 1;
        } else {
            return 0;
        }
    }
}
Run Code Online (Sandbox Code Playgroud)

以下是结果(来自YourKit分析器 ;如果您想要截图,请告诉我):

+----------------------------------------------------+-----------------+-----------------+--------------------+
|                        Name                        |    Time (ms)    |  Own Time (ms)  |  Invocation Count  |
+----------------------------------------------------+-----------------+-----------------+--------------------+
|  +---java.util.Arrays.sort(Object[], Comparator)   |  23,604  100 %  |          8,096  |               200  |
|    |                                               |                 |                 |                    |
|    +---SortMeComp.compare(Object, Object)          |  11,395   48 %  |          7,430  |        12,352,936  |
|    | |                                             |                 |                 |                    |
|    | +---SortMeComp.compare(SortMe, SortMe)        |   3,965   17 %  |          3,965  |        12,352,936  |
|    |                                               |                 |                 |                    |
|    +---SortMeCompTypeless.compare(Object, Object)  |   4,113   17 %  |          4,113  |        12,354,388  |
+----------------------------------------------------+-----------------+-----------------+--------------------+
Run Code Online (Sandbox Code Playgroud)

我没有过滤就运行了配置文件,你看到了对mergeSort的递归调用(只是让它难以阅读),但没有任何兴趣.

那么这里发生了什么?那个方法SortMeComp.compare(Object,Object)来自哪里?我们认为Java是内部处理泛型的东西,但可能需要这么长时间?我认为jvm只会将泛型方法视为"无类型"/ Object方法.正如您所看到的,简单的演员阵容要快得多.除此之外,我认为即使jvm需要做前期类型的东西,这也是JIT的一种东西.这里发生了什么?

顺便说说:

$ java -version
java version "1.6.0_26"
Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)
Run Code Online (Sandbox Code Playgroud)

编辑:

为了回应savinos的回答,我尝试使用"无类型"比较器模拟额外的方法调用,该比较器只是强制转换为类型化的比较:

public class SortMeCompMethodCalls implements Comparator{
    public int compare(Object oneObj, Object twoObj) {
        return compare((SortMe)oneObj, (SortMe)twoObj);
    }
    public int compare(SortMe one, SortMe two) {
        if(one.getValue()>two.getValue()) {
            return -1;
        } else if (one.getValue()<two.getValue()) {
            return 1;
        } else {
            return 0;
        }
    }
}
Run Code Online (Sandbox Code Playgroud)

结果如下:

+---------------------------------------------------------+-----------------+-----------------+--------------------+
|                          Name                           |    Time (ms)    |  Own Time (ms)  |  Invocation Count  |
+---------------------------------------------------------+-----------------+-----------------+--------------------+
|  +---java.util.Arrays.sort(Object[], Comparator)        |  31,044  100 %  |          8,061  |               200  |
|    |                                                    |                 |                 |                    |
|    +---SortMeComp.compare(Object, Object)               |  11,554   37 %  |          7,617  |        12,354,392  |
|    | |                                                  |                 |                 |                    |
|    | +---SortMeComp.compare(SortMe, SortMe)             |   3,936   13 %  |          3,936  |        12,354,392  |
|    |                                                    |                 |                 |                    |
|    +---SortMeCompMethodCalls.compare(Object, Object)    |  11,427   37 %  |          7,613  |        12,352,146  |
|      |                                                  |                 |                 |                    |
|      +---SortMeCompMethodCalls.compare(SortMe, SortMe)  |   3,814   12 %  |          3,814  |        12,352,146  |
+---------------------------------------------------------+-----------------+-----------------+--------------------+
Run Code Online (Sandbox Code Playgroud)

所以看起来savinos是对的!额外的时间只是额外的方法调用(加上一点点演员).这对我来说似乎很疯狂; 你会认为那会被JITed带走吗?呃,好吧.

我删除了编辑2并将其添加为答案,因为它应该是最初的.

Sav*_*era 10

我可能错了,但我会说"对象"比较器和类型化比较器(由通用比较器调用)之间的差异仅仅是由于额外的函数调用.

考虑到您正在进行12,352,936次调用,这意味着每个函数调用大约5.7*10 ^ -7秒,这并非不合理.


Bry*_*ead 0

我开始想知道这整件事是否是跟踪的产物(我使用的是跟踪分析,而不是采样)。我过去曾见过跟踪在方法调用频繁的区域中导致扭曲。所以我做了一个直接的计时测试:

public class Sandbox {
    public static void main(String argv[]) {
        long startTime = System.currentTimeMillis();
        sortTest(10000, 10000, new SortMeComp());
        System.err.println("\n"+(System.currentTimeMillis()-startTime));
        startTime = System.currentTimeMillis();
        sortTest(10000, 10000, new SortMeCompTypeless());
        System.err.println("\n"+(System.currentTimeMillis()-startTime));
    }

    public static void sortTest(int n, int l, Comparator<SortMe> c) {
        for(int i=0; i<n; i++) {
            SortMe[] sortMes = new SortMe[l];
            for(int j=0; j<l; j++) {
                sortMes[j] = new SortMe(Math.random());
            }
            System.out.print(sortMes[(int)(Math.random()*l)].getValue());
            Arrays.sort(sortMes, c);
        }
    }
}
Run Code Online (Sandbox Code Playgroud)

结果如下:

[bunch of doubles...]
sortTest(10000, 10000, new SortMeComp()): 18168
[bunch of doubles...]
sortTest(10000, 10000, new SortMeCompTypeless()): 19366
Run Code Online (Sandbox Code Playgroud)

正如您所看到的,类型化的实际上执行得更快,这是可以预料的,因为它没有进行强制转换。因此,看来我所看到的差异完全是由于追踪造成的。我对 Hotswap 的信心又恢复了!

顺便说一句,我放入 printlns 只是为了确保 jvm 不会以任何方式优化循环。