newInstance vs jdk-9/jdk-8和jmh中的new

Eug*_*ene 39 java performance java-8 jmh java-9

我在这里看到很多线程比较并尝试回答哪个更快:newInstance或者new operator.

看看源代码,它看起来newInstance应该慢得多,我的意思是它做了很多安全检查并使用反射.而且我决定先测量一下jdk-8.这是使用的代码jmh.

@BenchmarkMode(value = { Mode.AverageTime, Mode.SingleShotTime })
@Warmup(iterations = 5, time = 2, timeUnit = TimeUnit.SECONDS)   
@Measurement(iterations = 5, time = 2, timeUnit = TimeUnit.SECONDS)    
@State(Scope.Benchmark) 
public class TestNewObject {
    public static void main(String[] args) throws RunnerException {

        Options opt = new OptionsBuilder().include(TestNewObject.class.getSimpleName()).build();
        new Runner(opt).run();
    }

    @Fork(1)
    @Benchmark
    public Something newOperator() {
       return new Something();
    }

    @SuppressWarnings("deprecation")
    @Fork(1)
    @Benchmark
    public Something newInstance() throws InstantiationException, IllegalAccessException {
         return Something.class.newInstance();
    }

    static class Something {

    } 
}
Run Code Online (Sandbox Code Playgroud)

我不认为这里有很大的惊喜(JIT做了很多优化,使这个差别不大):

Benchmark                  Mode  Cnt      Score      Error  Units
TestNewObject.newInstance  avgt    5      7.762 ±    0.745  ns/op
TestNewObject.newOperator  avgt    5      4.714 ±    1.480  ns/op
TestNewObject.newInstance    ss    5  10666.200 ± 4261.855  ns/op
TestNewObject.newOperator    ss    5   1522.800 ± 2558.524  ns/op
Run Code Online (Sandbox Code Playgroud)

热门代码的差异大约是2倍,单次射击时间差得多.

现在我切换到jdk-9(构建157以防万一)并运行相同的代码.结果如下:

 Benchmark                  Mode  Cnt      Score      Error  Units
 TestNewObject.newInstance  avgt    5    314.307 ±   55.054  ns/op
 TestNewObject.newOperator  avgt    5      4.602 ±    1.084  ns/op
 TestNewObject.newInstance    ss    5  10798.400 ± 5090.458  ns/op
 TestNewObject.newOperator    ss    5   3269.800 ± 4545.827  ns/op
Run Code Online (Sandbox Code Playgroud)

这是热门代码中50倍的差异.我正在使用最新的jmh版本(1.19.SNAPSHOT).

在测试中再添加一个方法之后:

@Fork(1)
@Benchmark
public Something newInstanceJDK9() throws Exception {
    return Something.class.getDeclaredConstructor().newInstance();
}
Run Code Online (Sandbox Code Playgroud)

以下是整个结果n jdk-9:

TestNewObject.newInstance      avgt    5    308.342 ±   107.563  ns/op
TestNewObject.newInstanceJDK9  avgt    5     50.659 ±     7.964  ns/op
TestNewObject.newOperator      avgt    5      4.554 ±     0.616  ns/op    
Run Code Online (Sandbox Code Playgroud)

有人可以解释为什么会有这么大的差异吗?

apa*_*gin 57

首先,问题与模块系统(直接)无关.

我注意到即使使用JDK 9,第一次热身迭代newInstance也与JDK 8一样快.

# Fork: 1 of 1
# Warmup Iteration   1: 10,578 ns/op    <-- Fast!
# Warmup Iteration   2: 246,426 ns/op
# Warmup Iteration   3: 242,347 ns/op
Run Code Online (Sandbox Code Playgroud)

这意味着JIT编译中出现了问题.
-XX:+PrintCompilation确认在第一次迭代后重新编译基准:

10,762 ns/op
# Warmup Iteration   2:    1541  689   !   3       java.lang.Class::newInstance (160 bytes)   made not entrant
   1548  692 %     4       bench.generated.NewInstance_newInstance_jmhTest::newInstance_avgt_jmhStub @ 13 (56 bytes)
   1552  693       4       bench.generated.NewInstance_newInstance_jmhTest::newInstance_avgt_jmhStub (56 bytes)
   1555  662       3       bench.generated.NewInstance_newInstance_jmhTest::newInstance_avgt_jmhStub (56 bytes)   made not entrant
248,023 ns/op
Run Code Online (Sandbox Code Playgroud)

然后-XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining指出内联问题:

1577  667 %     4       bench.generated.NewInstance_newInstance_jmhTest::newInstance_avgt_jmhStub @ 13 (56 bytes)
                           @ 17   bench.NewInstance::newInstance (6 bytes)   inline (hot)
            !                @ 2   java.lang.Class::newInstance (160 bytes)   already compiled into a big method
Run Code Online (Sandbox Code Playgroud)

"已编译成大方法"消息意味着编译器无法内联Class.newInstance调用,因为被调用者的编译大小大于InlineSmallCode值(默认情况下为2000).

当我重新评估基准时-XX:InlineSmallCode=2500,它再次变得快速.

Benchmark                Mode  Cnt  Score   Error  Units
NewInstance.newInstance  avgt    5  8,847 ± 0,080  ns/op
NewInstance.operatorNew  avgt    5  5,042 ± 0,177  ns/op
Run Code Online (Sandbox Code Playgroud)

您知道,JDK 9现在将G1作为默认GC.如果我回归并行GC,即使使用默认值,基准测试也会很快InlineSmallCode.

重新运行JDK 9基准测试-XX:+UseParallelGC:

Benchmark                Mode  Cnt  Score   Error  Units
NewInstance.newInstance  avgt    5  8,728 ± 0,143  ns/op
NewInstance.operatorNew  avgt    5  4,822 ± 0,096  ns/op
Run Code Online (Sandbox Code Playgroud)

G1需要在对象存储发生时设置一些障碍,这就是编译后的代码变得更大的原因,因此Class.newInstance超出了默认InlineSmallCode限制.编译Class.newInstance变大的另一个原因是反射代码在JDK 9中被略微重写.

TL; DR JIT未能内联Class.newInstance,因为InlineSmallCode已超出限制.Class.newInstance由于JDK 9中反射代码的更改以及默认GC已更改为G1 ,编译版本变得更大.

  • @KirillRakhman这应该不是问题,因为在现实生活场景中,无论如何都不可能内联`newInstance`.我无法想象一个合理的情况,****构造函数通过反射在*相同*位置被调用*多次*.在最初的问题中,仅仅因为JIT适应调用特定方法而看到性能增益. (2认同)