Art*_*yan 22 java performance benchmarking jmh
有人可以解释为什么JMH说返回1比返回0快吗?
这是基准代码.
import org.openjdk.jmh.annotations.*;
import java.util.concurrent.TimeUnit;
@State(Scope.Thread)
@BenchmarkMode(Mode.Throughput)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Fork(value = 3, jvmArgsAppend = {"-server", "-disablesystemassertions"})
public class ZeroVsOneBenchmark {
@Benchmark
@Warmup(iterations = 3, time = 2, timeUnit = TimeUnit.SECONDS)
public int zero() {
return 0;
}
@Benchmark
@Warmup(iterations = 3, time = 2, timeUnit = TimeUnit.SECONDS)
public int one() {
return 1;
}
}
Run Code Online (Sandbox Code Playgroud)
结果如下:
# Run complete. Total time: 00:03:05
Benchmark Mode Samples Score Score error Units
c.m.ZeroVsOneBenchmark.one thrpt 60 1680674.502 24113.014 ops/ms
c.m.ZeroVsOneBenchmark.zero thrpt 60 735975.568 14779.380 ops/ms
Run Code Online (Sandbox Code Playgroud)
一,二和零的相同行为
# Run complete. Total time: 01:01:56
Benchmark Mode Samples Score Score error Units
c.m.ZeroVsOneBenchmark.one thrpt 90 1762956.470 7554.807 ops/ms
c.m.ZeroVsOneBenchmark.two thrpt 90 1764642.299 9277.673 ops/ms
c.m.ZeroVsOneBenchmark.zero thrpt 90 773010.467 5031.920 ops/ms
Run Code Online (Sandbox Code Playgroud)
apa*_*gin 34
JMH是一个很好的工具但仍然不完美.
当然,返回0,1或任何其他整数之间没有速度差异.但是,它对JMH 消耗的值以及HotSpot JIT如何编译它有所不同.
为了防止JIT优化计算,JMH使用特殊的Blackhole类来消耗从基准返回的值.这是一个整数值:
public final void consume(int i) {
if (i == i1 & i == i2) {
// SHOULD NEVER HAPPEN
nullBait.i1 = i; // implicit null pointer exception
}
}
Run Code Online (Sandbox Code Playgroud)
这i是从基准返回的值.在你的情况下,它是0或1.当i == 1永不发生的条件看起来像if (1 == i1 & 1 == i2)编译如下:
0x0000000002b4ffe5: mov 0xb0(%r13),%r10d ;*getfield i1
0x0000000002b4ffec: mov 0xb4(%r13),%r8d ;*getfield i2
0x0000000002b4fff3: cmp $0x1,%r8d
0x0000000002b4fff7: je 0x0000000002b50091 ;*return
Run Code Online (Sandbox Code Playgroud)
但是当i == 0JIT试图"优化"两个比较0使用setne指令时.但是结果代码变得太复杂了:
0x0000000002a40b28: mov 0xb0(%rdi),%r10d ;*getfield i1
0x0000000002a40b2f: mov 0xb4(%rdi),%r8d ;*getfield i2
0x0000000002a40b36: test %r10d,%r10d
0x0000000002a40b39: setne %r10b
0x0000000002a40b3d: movzbl %r10b,%r10d
0x0000000002a40b41: test %r8d,%r8d
0x0000000002a40b44: setne %r11b
0x0000000002a40b48: movzbl %r11b,%r11d
0x0000000002a40b4c: xor $0x1,%r10d
0x0000000002a40b50: xor $0x1,%r11d
0x0000000002a40b54: and %r11d,%r10d
0x0000000002a40b57: test %r10d,%r10d
0x0000000002a40b5a: jne 0x0000000002a40c15 ;*return
Run Code Online (Sandbox Code Playgroud)
也就是说,return 0执行的CPU指令越多,解释越慢Blackhole.consume().
JMH开发人员注意:我建议重写Blackhole.consume一下
if (i == l1) {
// SHOULD NEVER HAPPEN
nullBait.i1 = i; // implicit null pointer exception
}
Run Code Online (Sandbox Code Playgroud)
哪里volatile long l1 = Long.MIN_VALUE.在这种情况下,条件仍将始终为false,但对于所有返回值,它将被平均编译.
| 归档时间: |
|
| 查看次数: |
2299 次 |
| 最近记录: |