Ser*_*nov 6 java string performance benchmarking
我尝试了两种方法在 java.lang.String 上逐个字符地迭代,发现它们令人困惑。该基准对其进行了总结:
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Fork(jvmArgsAppend = {"-Xms2g", "-Xmx2g"})
public class CharByCharIterationBenchmark {
@Benchmark
public void toCharArray(Data data, Blackhole b) {
char[] chars = data.string.toCharArray();
for (char ch : chars) {
b.consume(ch);
}
}
@Benchmark
public void charAt(Data data, Blackhole b) {
String string = data.string;
int length = string.length();
for (int i = 0; i < length; i++) {
b.consume(string.charAt(i));
}
}
@State(Scope.Thread)
public static class Data {
String string;
@Param({"true", "false"})
private boolean latin;
@Param({"5", "10", "50", "100"})
private int length;
@Setup
public void setup() {
String alphabet = latin
? "abcdefghijklmnopqrstuvwxyz" // English
: "????????????????????????????????"; // Russian
RandomStringGenerator generator = new RandomStringGenerator();
string = generator.randomString(alphabet, length);
}
}
Run Code Online (Sandbox Code Playgroud)
直观地说, 中描述的方法toCharArray()似乎不太有效,因为它分配了char[]Java 8的底层副本并编码byte[]到char[]Java 9 和更新版本。但在实践中反之亦然:toCharArray()执行速度要快得多:
Java 8
(latin) (length) Mode Score Error Units
charAt true 5 avgt 21.051 ± 0.796 ns/op
charAt true 10 avgt 44.002 ± 2.324 ns/op
charAt true 50 avgt 221.068 ± 7.422 ns/op
charAt true 100 avgt 410.162 ± 13.441 ns/op
toCharArray true 5 avgt 16.819 ± 0.662 ns/op
toCharArray true 10 avgt 28.364 ± 0.663 ns/op
toCharArray true 50 avgt 110.910 ± 1.144 ns/op
toCharArray true 100 avgt 205.694 ± 1.669 ns/op
charAt:·gc.alloc.rate.norm true 5 avgt ? 10?? B/op
charAt:·gc.alloc.rate.norm true 10 avgt ? 10?? B/op
charAt:·gc.alloc.rate.norm true 50 avgt ? 10?? B/op
charAt:·gc.alloc.rate.norm true 100 avgt ? 10?? B/op
toCharArray:·gc.alloc.rate.norm true 5 avgt 32.000 ± 0.001 B/op
toCharArray:·gc.alloc.rate.norm true 10 avgt 40.000 ± 0.001 B/op
toCharArray:·gc.alloc.rate.norm true 50 avgt 120.000 ± 0.001 B/op
toCharArray:·gc.alloc.rate.norm true 100 avgt 216.000 ± 0.001 B/op
charAt false 5 avgt 20.372 ± 0.406 ns/op
charAt false 10 avgt 39.962 ± 0.911 ns/op
charAt false 50 avgt 201.337 ± 3.752 ns/op
charAt false 100 avgt 410.530 ± 17.931 ns/op
toCharArray false 5 avgt 15.767 ± 0.606 ns/op
toCharArray false 10 avgt 26.258 ± 0.345 ns/op
toCharArray false 50 avgt 109.631 ± 1.064 ns/op
toCharArray false 100 avgt 205.815 ± 4.716 ns/op
charAt:·gc.alloc.rate.norm false 5 avgt ? 10?? B/op
charAt:·gc.alloc.rate.norm false 10 avgt ? 10?? B/op
charAt:·gc.alloc.rate.norm false 50 avgt ? 10?? B/op
charAt:·gc.alloc.rate.norm false 100 avgt ? 10?? B/op
toCharArray:·gc.alloc.rate.norm false 5 avgt 32.000 ± 0.001 B/op
toCharArray:·gc.alloc.rate.norm false 10 avgt 40.000 ± 0.001 B/op
toCharArray:·gc.alloc.rate.norm false 50 avgt 120.000 ± 0.001 B/op
toCharArray:·gc.alloc.rate.norm false 100 avgt 216.000 ± 0.001 B/op
Java 11
(latin) (length) Mode Score Error Units
charAt true 5 avgt 22.035 ± 1.557 ns/op
charAt true 10 avgt 41.800 ± 1.572 ns/op
charAt true 50 avgt 227.180 ± 18.655 ns/op
charAt true 100 avgt 474.719 ± 29.782 ns/op
toCharArray true 5 avgt 17.091 ± 0.662 ns/op
toCharArray true 10 avgt 26.167 ± 0.220 ns/op
toCharArray true 50 avgt 127.876 ± 2.106 ns/op
toCharArray true 100 avgt 244.449 ± 9.330 ns/op
charAt:·gc.alloc.rate.norm true 5 avgt ? 10?? B/op
charAt:·gc.alloc.rate.norm true 10 avgt ? 10?? B/op
charAt:·gc.alloc.rate.norm true 50 avgt ? 10?? B/op
charAt:·gc.alloc.rate.norm true 100 avgt ? 10?? B/op
toCharArray:·gc.alloc.rate.norm true 5 avgt 32.000 ± 0.001 B/op
toCharArray:·gc.alloc.rate.norm true 10 avgt 40.000 ± 0.001 B/op
toCharArray:·gc.alloc.rate.norm true 50 avgt 120.000 ± 0.001 B/op
toCharArray:·gc.alloc.rate.norm true 100 avgt 216.000 ± 0.001 B/op
charAt false 5 avgt 22.215 ± 2.064 ns/op
charAt false 10 avgt 45.606 ± 2.567 ns/op
charAt false 50 avgt 204.577 ± 18.302 ns/op
charAt false 100 avgt 404.056 ± 10.203 ns/op
toCharArray false 5 avgt 17.055 ± 0.556 ns/op
toCharArray false 10 avgt 29.254 ± 2.616 ns/op
toCharArray false 50 avgt 123.610 ± 5.033 ns/op
toCharArray false 100 avgt 226.174 ± 6.396 ns/op
charAt:·gc.alloc.rate.norm false 5 avgt ? 10?? B/op
charAt:·gc.alloc.rate.norm false 10 avgt ? 10?? B/op
charAt:·gc.alloc.rate.norm false 50 avgt ? 10?? B/op
charAt:·gc.alloc.rate.norm false 100 avgt ? 10?? B/op
toCharArray:·gc.alloc.rate.norm false 5 avgt 32.000 ± 0.001 B/op
toCharArray:·gc.alloc.rate.norm false 10 avgt 40.000 ± 0.001 B/op
toCharArray:·gc.alloc.rate.norm false 50 avgt 120.000 ± 0.001 B/op
toCharArray:·gc.alloc.rate.norm false 100 avgt 216.000 ± 0.001 B/op
Run Code Online (Sandbox Code Playgroud)
首先,我认为这里的原因与 Nitsan Wakart 的文章“不稳定阅读惊喜”中描述的原因相同。但是,使用 perfasm 进行分析时,我发现代码中的最热点与char[]/byte[]字段访问无关:
?? 0x00007fa638407dd9: jmp 0x00007fa638407e4c
?? 0x00007fa638407ddb: nopl 0x0(%rax,%rax,1)
4.96% ?? ? 0x00007fa638407de0: shl $0x3,%r11
0.01% ?? ? 0x00007fa638407de4: movzwl 0x10(%r11,%r13,2),%edx ;*invokevirtual charAt {reexecute=0 rethrow=0 return_oop=0}
?? ? ; - tsypanov.strings.character.CharByCharIterationBenchmark::charAt@25 (line 35)
?? ? ; - tsypanov.strings.character.generated.CharByCharIterationBenchmark_charAt_jmhTest::charAt_avgt_jmhStub@19 (line 191)
3.58% ?? ?? 0x00007fa638407dea: mov %rsi,0x18(%rsp)
1.87% ?? ?? 0x00007fa638407def: mov %r8d,0x14(%rsp)
4.18% ?? ?? 0x00007fa638407df4: mov %edi,0x10(%rsp)
0.04% ?? ?? 0x00007fa638407df8: mov %rbx,0x8(%rsp)
1.29% ?? ?? 0x00007fa638407dfd: mov %r10,(%rsp)
1.83% ?? ?? 0x00007fa638407e01: mov %r9,0x70(%rsp)
4.32% ?? ?? 0x00007fa638407e06: mov %rax,0x60(%rsp)
0.05% ?? ?? 0x00007fa638407e0b: mov %r9,%rsi
1.27% ?? ?? 0x00007fa638407e0e: nop
1.88% ?? ?? 0x00007fa638407e0f: callq 0x00007fa630926e00 ; ImmutableOopMap{[96]=Oop [104]=Oop [112]=Oop [120]=Oop [0]=Oop [16]=NarrowOop [24]=Oop }
?? ?? ;*invokevirtual consume {reexecute=0 rethrow=0 return_oop=0}
?? ?? ; - tsypanov.strings.character.CharByCharIterationBenchmark::charAt@28 (line 35)
?? ?? ; - tsypanov.strings.character.generated.CharByCharIterationBenchmark_charAt_jmhTest::charAt_avgt_jmhStub@19 (line 191)
?? ?? ; {optimized virtual_call}
5.71% ?? ?? 0x00007fa638407e14: inc %ebp ;*iinc {reexecute=0 rethrow=0 return_oop=0}
?? ?? ; - tsypanov.strings.character.CharByCharIterationBenchmark::charAt@31 (line 34)
?? ?? ; - tsypanov.strings.character.generated.CharByCharIterationBenchmark_charAt_jmhTest::charAt_avgt_jmhStub@19 (line 191)
0.05% ?? ?? 0x00007fa638407e16: cmp 0x14(%rsp),%ebp
0.00% ?? ?? 0x00007fa638407e1a: jge 0x00007fa638407d87 ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
? ?? ; - tsypanov.strings.character.CharByCharIterationBenchmark::charAt@18 (line 34)
? ?? ; - tsypanov.strings.character.generated.CharByCharIterationBenchmark_charAt_jmhTest::charAt_avgt_jmhStub@19 (line 191)
3.05% ? ?? 0x00007fa638407e20: mov 0x10(%rsp),%edi
4.24% ? ?? 0x00007fa638407e24: movsbl 0x14(%r12,%rdi,8),%ecx ;*getfield coder {reexecute=0 rethrow=0 return_oop=0}
? ?? ; - java.lang.String::isLatin1@7 (line 3266)
? ?? ; - java.lang.String::charAt@1 (line 692)
? ?? ; - tsypanov.strings.character.CharByCharIterationBenchmark::charAt@25 (line 35)
? ?? ; - tsypanov.strings.character.generated.CharByCharIterationBenchmark_charAt_jmhTest::charAt_avgt_jmhStub@19 (line 191)
0.86% ? ?? 0x00007fa638407e2a: mov 0xc(%r12,%rdi,8),%r11d ;*getfield value {reexecute=0 rethrow=0 return_oop=0}
? ?? ; - java.lang.String::charAt@8 (line 693)
? ?? ; - tsypanov.strings.character.CharByCharIterationBenchmark::charAt@25 (line 35)
? ?? ; - tsypanov.strings.character.generated.CharByCharIterationBenchmark_charAt_jmhTest::charAt_avgt_jmhStub@19 (line 191)
1.67% ? ?? 0x00007fa638407e2f: mov 0x60(%rsp),%rax
1.71% ? ?? 0x00007fa638407e34: mov 0x70(%rsp),%r9
3.92% ? ?? 0x00007fa638407e39: mov (%rsp),%r10
0.20% ? ?? 0x00007fa638407e3d: mov 0x8(%rsp),%rbx
1.44% ? ?? 0x00007fa638407e42: mov 0x14(%rsp),%r8d
1.70% ? ?? 0x00007fa638407e47: mov 0x18(%rsp),%rsi ;*aload_2 {reexecute=0 rethrow=0 return_oop=0}
? ?? ; - tsypanov.strings.character.CharByCharIterationBenchmark::charAt@21 (line 35)
? ?? ; - tsypanov.strings.character.generated.CharByCharIterationBenchmark_charAt_jmhTest::charAt_avgt_jmhStub@19 (line 191)
3.93% ? ?? 0x00007fa638407e4c: movslq %ebp,%r13 ;*invokestatic getChar {reexecute=0 rethrow=0 return_oop=0}
?? ; - java.lang.StringUTF16::charAt@7 (line 1268)
?? ; - java.lang.String::charAt@21 (line 695)
?? ; - tsypanov.strings.character.CharByCharIterationBenchmark::charAt@25 (line 35)
?? ; - tsypanov.strings.character.generated.CharByCharIterationBenchmark_charAt_jmhTest::charAt_avgt_jmhStub@19 (line 191)
0.23% ?? 0x00007fa638407e4f: test %ecx,%ecx
0.00% ??? 0x00007fa638407e51: jne 0x00007fa638407e6b ;*ifeq {reexecute=0 rethrow=0 return_oop=0}
??? ; - java.lang.String::charAt@4 (line 692)
??? ; - tsypanov.strings.character.CharByCharIterationBenchmark::charAt@25 (line 35)
??? ; - tsypanov.strings.character.generated.CharByCharIterationBenchmark_charAt_jmhTest::charAt_avgt_jmhStub@19 (line 191)
??? 0x00007fa638407e53: mov 0xc(%r12,%r11,8),%edx ; implicit exception: dispatches to 0x00007fa638407fbc
??? 0x00007fa638407e58: cmp %edx,%ebp
??? 0x00007fa638407e5a: jae 0x00007fa638407eb0
??? 0x00007fa638407e5c: shl $0x3,%r11
??? 0x00007fa638407e60: movzbl 0x10(%r11,%r13,1),%edx ;*iand {reexecute=0 rethrow=0 return_oop=0}
??? ; - java.lang.StringLatin1::charAt@25 (line 49)
??? ; - java.lang.String::charAt@12 (line 693)
??? ; - tsypanov.strings.character.CharByCharIterationBenchmark::charAt@25 (line 35)
??? ; - tsypanov.strings.character.generated.CharByCharIterationBenchmark_charAt_jmhTest::charAt_avgt_jmhStub@19 (line 191)
??? 0x00007fa638407e66: jmpq 0x00007fa638407dea
1.52% ? ? 0x00007fa638407e6b: mov 0xc(%r12,%r11,8),%ecx ; implicit exception: dispatches to 0x00007fa638407fb0
5.99% ? 0x00007fa638407e70: sar %ecx ;*ishr {reexecute=0 rethrow=0 return_oop=0}
? ; - java.lang.StringUTF16::length@3 (line 74)
? ; - java.lang.StringUTF16::checkIndex@2 (line 1470)
? ; - java.lang.StringUTF16::charAt@2 (line 1267)
? ; - java.lang.String::charAt@21 (line 695)
? ; - tsypanov.strings.character.CharByCharIterationBenchmark::charAt@25 (line 35)
? ; - tsypanov.strings.character.generated.CharByCharIterationBenchmark_charAt_jmhTest::charAt_avgt_jmhStub@19 (line 191)
5.51% ? 0x00007fa638407e72: cmp %ecx,%ebp
0.01% ? 0x00007fa638407e74: jb 0x00007fa638407de0 ;*if_icmplt {reexecute=0 rethrow=0 return_oop=0}
Run Code Online (Sandbox Code Playgroud)
看起来最热门的地方是iinc(循环索引的增量)和对我来说相当不直观的ishr算术变换StringUTF16.length()。
还使用 perfnorm 分析器,我发现与toCharArray()以下相比,它具有更少的周期、指令和加载缺失chatAt():
Benchmark Mode Cnt Score Error Units
charAt:L1-dcache-loads avgt 2104.816 #/op
charAt:L1-dcache-stores avgt 1200.878 #/op
charAt:branches avgt 603.754 #/op
charAt:cycles avgt 1461.282 #/op
charAt:dTLB-loads avgt 2105.253 #/op
charAt:dTLB-stores avgt 1200.909 #/op
charAt:instructions avgt 4716.775 #/op
toCharArray:L1-dcache-loads avgt 1026.341 #/op
toCharArray:L1-dcache-stores avgt 416.997 #/op
toCharArray:branches avgt 419.265 #/op
toCharArray:cycles avgt 820.521 #/op
toCharArray:dTLB-loads avgt 1026.506 #/op
toCharArray:dTLB-stores avgt 417.591 #/op
toCharArray:instructions avgt 2409.806 #/op
Run Code Online (Sandbox Code Playgroud)
有人可以帮助解释这一点并解释如此显着的差异吗?