And*_*ell 274 java compiler-construction performance assembly switch-statement
我正在研究一些需要高度优化的Java代码,因为它将在我的主程序逻辑中的许多点调用的热函数中运行.此代码的一部分涉及将double变量乘以10凸起到任意非负int exponents.一个快速的方式(编辑:但不是最快的,见下面的更新2)获得相乘的值是switch在exponent:
double multiplyByPowerOfTen(final double d, final int exponent) {
switch (exponent) {
case 0:
return d;
case 1:
return d*10;
case 2:
return d*100;
// ... same pattern
case 9:
return d*1000000000;
case 10:
return d*10000000000L;
// ... same pattern with long literals
case 18:
return d*1000000000000000000L;
default:
throw new ParseException("Unhandled power of ten " + power, 0);
}
}
Run Code Online (Sandbox Code Playgroud)
上面注释的省略号表示case int常量继续递增1,因此case上面的代码片段中确实有19 秒.因为我不知道我是否真的需要10一切权力,并case声明10通18,我进行了一些微基准比较的时间内完成10万个操作与此switch说法相对于一个switch只有case小号0通9(与exponent限制为9或以下避免打破精简版switch).我得到了相当令人惊讶的(对我来说,至少!)结果的时间越长switch更多的case语句会跑的更快.
在云雀上,我尝试添加更多case只返回虚拟值的s,并发现我可以使用大约22-27声明的cases 来更快地运行开关(即使这些虚拟情况在代码运行时从未实际被击中) ).(同样,cases通过将先前的case常数递增来以连续的方式添加1.)这些执行时间差异不是很显着:对于exponent在0和之间的随机10,虚拟填充switch语句在1.49秒内完成1000万次执行,而在未填充的情况下完成1.54秒版本,每次执行总计节省5ns.因此,switch从优化的角度来看,不是那种让人沉迷于填写声明值得努力的事情.但我仍然发现好奇和反直觉的是,随着更多s被添加到它,a switch不会变得更慢(或者最好保持恒定的O(1)时间)case.

这些是我通过对随机生成的exponent值进行各种限制而获得的结果.我不包括结果到一路1的exponent限制,但曲线的总体形状相同,大约有12-17的情况下标记的脊和18-28之间的山谷.所有测试都在JUnitBenchmarks中使用共享容器运行随机值,以确保相同的测试输入.我还按照从最长switch语句到最短语句的顺序运行测试,反之亦然,以尝试消除与排序相关的测试问题的可能性.如果有人想尝试重现这些结果,我已将我的测试代码放在github repo上.
那么,这里发生了什么?我的建筑或微观基准建筑的一些变幻莫测?或者Java switch在18to 28 case范围内执行的速度是否真的比从11最高到更快17?
github test repo"switch-experiment"
更新:我清理了基准测试库,并在/ results中添加了一个文本文件,其中包含更多可能exponent值的输出.我还添加了在测试代码不抛出一个选项Exception从default,但这似乎并不影响使用效果.
更新2:在2009年的xkcd论坛上发现了一些关于这个问题的非常好的讨论:http://forums.xkcd.com/viewtopic.php?f = 11&t = 33524 .OP对使用的讨论Array.binarySearch()让我想到了上面的取幂模式的基于数组的简单实现.因为我知道它们中的条目array是什么,所以不需要二进制搜索.它似乎比使用速度快3倍switch,显然是以牺牲一些控制流量switch为代价.该代码也被添加到github repo中.
ass*_*ias 227
正如另一个答案所指出的,因为案例值是连续的(与稀疏相反),所以为各种测试生成的字节码使用切换表(字节码指令tableswitch).
但是,一旦JIT开始工作并将字节码编译成汇编,tableswitch指令就不会总是产生一个指针数组:有时候,switch表会被转换成看起来像a lookupswitch(类似于if/ else ifstructure)的东西.
对JIT(热点JDK 1.7)生成的程序集进行反编译表明,如果有17个案例或更少的情况,它会使用if/else的连续符号,当有超过18个时更高效的指针数组.
使用这个神奇数字18的原因似乎归结为MinJumpTableSizeJVM标志的默认值(代码中的第352行).
我在热点编译器列表上提出了这个问题,它似乎是过去测试的遗留问题.请注意,在执行更多基准测试后,已在JDK 8中删除此默认值.
最后,当方法变得太长时(在我的测试中大于25个案例),使用默认的JVM设置不再内联 - 这是导致此时性能下降的最可能原因.
在5个案例中,反编译代码看起来像这样(注意cmp/je/jg/jmp指令,if/goto的程序集):
[Verified Entry Point]
# {method} 'multiplyByPowerOfTen' '(DI)D' in 'javaapplication4/Test1'
# parm0: xmm0:xmm0 = double
# parm1: rdx = int
# [sp+0x20] (sp of caller)
0x00000000024f0160: mov DWORD PTR [rsp-0x6000],eax
; {no_reloc}
0x00000000024f0167: push rbp
0x00000000024f0168: sub rsp,0x10 ;*synchronization entry
; - javaapplication4.Test1::multiplyByPowerOfTen@-1 (line 56)
0x00000000024f016c: cmp edx,0x3
0x00000000024f016f: je 0x00000000024f01c3
0x00000000024f0171: cmp edx,0x3
0x00000000024f0174: jg 0x00000000024f01a5
0x00000000024f0176: cmp edx,0x1
0x00000000024f0179: je 0x00000000024f019b
0x00000000024f017b: cmp edx,0x1
0x00000000024f017e: jg 0x00000000024f0191
0x00000000024f0180: test edx,edx
0x00000000024f0182: je 0x00000000024f01cb
0x00000000024f0184: mov ebp,edx
0x00000000024f0186: mov edx,0x17
0x00000000024f018b: call 0x00000000024c90a0 ; OopMap{off=48}
;*new ; - javaapplication4.Test1::multiplyByPowerOfTen@72 (line 83)
; {runtime_call}
0x00000000024f0190: int3 ;*new ; - javaapplication4.Test1::multiplyByPowerOfTen@72 (line 83)
0x00000000024f0191: mulsd xmm0,QWORD PTR [rip+0xffffffffffffffa7] # 0x00000000024f0140
;*dmul
; - javaapplication4.Test1::multiplyByPowerOfTen@52 (line 62)
; {section_word}
0x00000000024f0199: jmp 0x00000000024f01cb
0x00000000024f019b: mulsd xmm0,QWORD PTR [rip+0xffffffffffffff8d] # 0x00000000024f0130
;*dmul
; - javaapplication4.Test1::multiplyByPowerOfTen@46 (line 60)
; {section_word}
0x00000000024f01a3: jmp 0x00000000024f01cb
0x00000000024f01a5: cmp edx,0x5
0x00000000024f01a8: je 0x00000000024f01b9
0x00000000024f01aa: cmp edx,0x5
0x00000000024f01ad: jg 0x00000000024f0184 ;*tableswitch
; - javaapplication4.Test1::multiplyByPowerOfTen@1 (line 56)
0x00000000024f01af: mulsd xmm0,QWORD PTR [rip+0xffffffffffffff81] # 0x00000000024f0138
;*dmul
; - javaapplication4.Test1::multiplyByPowerOfTen@64 (line 66)
; {section_word}
0x00000000024f01b7: jmp 0x00000000024f01cb
0x00000000024f01b9: mulsd xmm0,QWORD PTR [rip+0xffffffffffffff67] # 0x00000000024f0128
;*dmul
; - javaapplication4.Test1::multiplyByPowerOfTen@70 (line 68)
; {section_word}
0x00000000024f01c1: jmp 0x00000000024f01cb
0x00000000024f01c3: mulsd xmm0,QWORD PTR [rip+0xffffffffffffff55] # 0x00000000024f0120
;*tableswitch
; - javaapplication4.Test1::multiplyByPowerOfTen@1 (line 56)
; {section_word}
0x00000000024f01cb: add rsp,0x10
0x00000000024f01cf: pop rbp
0x00000000024f01d0: test DWORD PTR [rip+0xfffffffffdf3fe2a],eax # 0x0000000000430000
; {poll_return}
0x00000000024f01d6: ret
Run Code Online (Sandbox Code Playgroud)
在18个案例中,程序集看起来像这样(注意使用的指针数组并且抑制了所有比较的需要:jmp QWORD PTR [r8+r10*1]直接跳到正确的乘法) - 这是性能改进的可能原因:
[Verified Entry Point]
# {method} 'multiplyByPowerOfTen' '(DI)D' in 'javaapplication4/Test1'
# parm0: xmm0:xmm0 = double
# parm1: rdx = int
# [sp+0x20] (sp of caller)
0x000000000287fe20: mov DWORD PTR [rsp-0x6000],eax
; {no_reloc}
0x000000000287fe27: push rbp
0x000000000287fe28: sub rsp,0x10 ;*synchronization entry
; - javaapplication4.Test1::multiplyByPowerOfTen@-1 (line 56)
0x000000000287fe2c: cmp edx,0x13
0x000000000287fe2f: jae 0x000000000287fe46
0x000000000287fe31: movsxd r10,edx
0x000000000287fe34: shl r10,0x3
0x000000000287fe38: movabs r8,0x287fd70 ; {section_word}
0x000000000287fe42: jmp QWORD PTR [r8+r10*1] ;*tableswitch
; - javaapplication4.Test1::multiplyByPowerOfTen@1 (line 56)
0x000000000287fe46: mov ebp,edx
0x000000000287fe48: mov edx,0x31
0x000000000287fe4d: xchg ax,ax
0x000000000287fe4f: call 0x00000000028590a0 ; OopMap{off=52}
;*new ; - javaapplication4.Test1::multiplyByPowerOfTen@202 (line 96)
; {runtime_call}
0x000000000287fe54: int3 ;*new ; - javaapplication4.Test1::multiplyByPowerOfTen@202 (line 96)
0x000000000287fe55: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe8b] # 0x000000000287fce8
;*dmul
; - javaapplication4.Test1::multiplyByPowerOfTen@194 (line 92)
; {section_word}
0x000000000287fe5d: jmp 0x000000000287ff16
0x000000000287fe62: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe86] # 0x000000000287fcf0
;*dmul
; - javaapplication4.Test1::multiplyByPowerOfTen@188 (line 90)
; {section_word}
0x000000000287fe6a: jmp 0x000000000287ff16
0x000000000287fe6f: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe81] # 0x000000000287fcf8
;*dmul
; - javaapplication4.Test1::multiplyByPowerOfTen@182 (line 88)
; {section_word}
0x000000000287fe77: jmp 0x000000000287ff16
0x000000000287fe7c: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe7c] # 0x000000000287fd00
;*dmul
; - javaapplication4.Test1::multiplyByPowerOfTen@176 (line 86)
; {section_word}
0x000000000287fe84: jmp 0x000000000287ff16
0x000000000287fe89: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe77] # 0x000000000287fd08
;*dmul
; - javaapplication4.Test1::multiplyByPowerOfTen@170 (line 84)
; {section_word}
0x000000000287fe91: jmp 0x000000000287ff16
0x000000000287fe96: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe72] # 0x000000000287fd10
;*dmul
; - javaapplication4.Test1::multiplyByPowerOfTen@164 (line 82)
; {section_word}
0x000000000287fe9e: jmp 0x000000000287ff16
0x000000000287fea0: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe70] # 0x000000000287fd18
;*dmul
; - javaapplication4.Test1::multiplyByPowerOfTen@158 (line 80)
; {section_word}
0x000000000287fea8: jmp 0x000000000287ff16
0x000000000287feaa: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe6e] # 0x000000000287fd20
;*dmul
; - javaapplication4.Test1::multiplyByPowerOfTen@152 (line 78)
; {section_word}
0x000000000287feb2: jmp 0x000000000287ff16
0x000000000287feb4: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe24] # 0x000000000287fce0
;*dmul
; - javaapplication4.Test1::multiplyByPowerOfTen@146 (line 76)
; {section_word}
0x000000000287febc: jmp 0x000000000287ff16
0x000000000287febe: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe6a] # 0x000000000287fd30
;*dmul
; - javaapplication4.Test1::multiplyByPowerOfTen@140 (line 74)
; {section_word}
0x000000000287fec6: jmp 0x000000000287ff16
0x000000000287fec8: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe68] # 0x000000000287fd38
;*dmul
; - javaapplication4.Test1::multiplyByPowerOfTen@134 (line 72)
; {section_word}
0x000000000287fed0: jmp 0x000000000287ff16
0x000000000287fed2: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe66] # 0x000000000287fd40
;*dmul
; - javaapplication4.Test1::multiplyByPowerOfTen@128 (line 70)
; {section_word}
0x000000000287feda: jmp 0x000000000287ff16
0x000000000287fedc: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe64] # 0x000000000287fd48
;*dmul
; - javaapplication4.Test1::multiplyByPowerOfTen@122 (line 68)
; {section_word}
0x000000000287fee4: jmp 0x000000000287ff16
0x000000000287fee6: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe62] # 0x000000000287fd50
;*dmul
; - javaapplication4.Test1::multiplyByPowerOfTen@116 (line 66)
; {section_word}
0x000000000287feee: jmp 0x000000000287ff16
0x000000000287fef0: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe60] # 0x000000000287fd58
;*dmul
; - javaapplication4.Test1::multiplyByPowerOfTen@110 (line 64)
; {section_word}
0x000000000287fef8: jmp 0x000000000287ff16
0x000000000287fefa: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe5e] # 0x000000000287fd60
;*dmul
; - javaapplication4.Test1::multiplyByPowerOfTen@104 (line 62)
; {section_word}
0x000000000287ff02: jmp 0x000000000287ff16
0x000000000287ff04: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe5c] # 0x000000000287fd68
;*dmul
; - javaapplication4.Test1::multiplyByPowerOfTen@98 (line 60)
; {section_word}
0x000000000287ff0c: jmp 0x000000000287ff16
0x000000000287ff0e: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe12] # 0x000000000287fd28
;*tableswitch
; - javaapplication4.Test1::multiplyByPowerOfTen@1 (line 56)
; {section_word}
0x000000000287ff16: add rsp,0x10
0x000000000287ff1a: pop rbp
0x000000000287ff1b: test DWORD PTR [rip+0xfffffffffd9b00df],eax # 0x0000000000230000
; {poll_return}
0x000000000287ff21: ret
Run Code Online (Sandbox Code Playgroud)
最后,30个案例(下面)的装配看起来类似于18个案例,除了movapd xmm0,xmm1代码中间出现的附加内容,正如@cHao所发现的那样 - 但是性能下降的最可能的原因是方法也是如此很长时间内联默认的JVM设置:
[Verified Entry Point]
# {method} 'multiplyByPowerOfTen' '(DI)D' in 'javaapplication4/Test1'
# parm0: xmm0:xmm0 = double
# parm1: rdx = int
# [sp+0x20] (sp of caller)
0x0000000002524560: mov DWORD PTR [rsp-0x6000],eax
; {no_reloc}
0x0000000002524567: push rbp
0x0000000002524568: sub rsp,0x10 ;*synchronization entry
; - javaapplication4.Test1::multiplyByPowerOfTen@-1 (line 56)
0x000000000252456c: movapd xmm1,xmm0
0x0000000002524570: cmp edx,0x1f
0x0000000002524573: jae 0x0000000002524592 ;*tableswitch
; - javaapplication4.Test1::multiplyByPowerOfTen@1 (line 56)
0x0000000002524575: movsxd r10,edx
0x0000000002524578: shl r10,0x3
0x000000000252457c: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe3c] # 0x00000000025243c0
;*dmul
; - javaapplication4.Test1::multiplyByPowerOfTen@364 (line 118)
; {section_word}
0x0000000002524584: movabs r8,0x2524450 ; {section_word}
0x000000000252458e: jmp QWORD PTR [r8+r10*1] ;*tableswitch
; - javaapplication4.Test1::multiplyByPowerOfTen@1 (line 56)
0x0000000002524592: mov ebp,edx
0x0000000002524594: mov edx,0x31
0x0000000002524599: xchg ax,ax
0x000000000252459b: call 0x00000000024f90a0 ; OopMap{off=64}
;*new ; - javaapplication4.Test1::multiplyByPowerOfTen@370 (line 120)
; {runtime_call}
0x00000000025245a0: int3 ;*new ; - javaapplication4.Test1::multiplyByPowerOfTen@370 (line 120)
0x00000000025245a1: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe27] # 0x00000000025243d0
;*dmul
; - javaapplication4.Test1::multiplyByPowerOfTen@358 (line 116)
; {section_word}
0x00000000025245a9: jmp 0x0000000002524744
0x00000000025245ae: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe22] # 0x00000000025243d8
;*dmul
; - javaapplication4.Test1::multiplyByPowerOfTen@348 (line 114)
; {section_word}
0x00000000025245b6: jmp 0x0000000002524744
0x00000000025245bb: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe1d] # 0x00000000025243e0
;*dmul
; - javaapplication4.Test1::multiplyByPowerOfTen@338 (line 112)
; {section_word}
0x00000000025245c3: jmp 0x0000000002524744
0x00000000025245c8: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe18] # 0x00000000025243e8
;*dmul
; - javaapplication4.Test1::multiplyByPowerOfTen@328 (line 110)
; {section_word}
0x00000000025245d0: jmp 0x0000000002524744
0x00000000025245d5: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe13] # 0x00000000025243f0
;*dmul
; - javaapplication4.Test1::multiplyByPowerOfTen@318 (line 108)
; {section_word}
0x00000000025245dd: jmp 0x0000000002524744
0x00000000025245e2: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe0e] # 0x00000000025243f8
;*dmul
; - javaapplication4.Test1::multiplyByPowerOfTen@308 (line 106)
; {section_word}
0x00000000025245ea: jmp 0x0000000002524744
0x00000000025245ef: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe09] # 0x0000000002524400
;*dmul
; - javaapplication4.Test1::multiplyByPowerOfTen@298 (line 104)
; {section_word}
0x00000000025245f7: jmp 0x0000000002524744
0x00000000025245fc: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe04] # 0x0000000002524408
;*dmul
; - javaapplication4.Test1::multiplyByPowerOfTen@288 (line 102)
; {section_word}
0x0000000002524604: jmp 0x0000000002524744
0x0000000002524609: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffdff] # 0x0000000002524410
;*dmul
; - javaapplication4.Test1::multiplyByPowerOfTen@278 (line 100)
; {section_word}
0x0000000002524611: jmp 0x0000000002524744
0x0000000002524616: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffdfa] # 0x0000000002524418
;*dmul
; - javaapplication4.Test1::multiplyByPowerOfTen@268 (line 98)
; {section_word}
0x000000000252461e: jmp 0x0000000002524744
0x0000000002524623: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffd9d] # 0x00000000025243c8
;*dmul
; - javaapplication4.Test1::multiplyByPowerOfTen@258 (line 96)
; {section_word}
0x000000000252462b: jmp 0x0000000002524744
0x0000000002524630: movapd xmm0,xmm1
0x0000000002524634: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe0c] # 0x0000000002524448
;*dmul
; - javaapplication4.Test1::multiplyByPowerOfTen@242 (line 92)
; {section_word}
0x000000000252463c: jmp 0x0000000002524744
0x0000000002524641: movapd xmm0,xmm1
0x0000000002524645: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffddb] # 0x0000000002524428
;*dmul
; - javaapplication4.Test1::multiplyByPowerOfTen@236 (line 90)
; {section_word}
0x000000000252464d: jmp 0x0000000002524744
0x0000000002524652: movapd xmm0,xmm1
0x0000000002524656: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffdd2] # 0x0000000002524430
;*dmul
; - javaapplication4.Test1::multiplyByPowerOfTen@230 (line 88)
; {section_word}
0x000000000252465e: jmp 0x0000000002524744
0x0000000002524663: movapd xmm0,xmm1
0x0000000002524667: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffdc9] # 0x0000000002524438
;*dmul
; - javaapplication4.Test1::multiplyByPowerOfTen@224 (line 86)
; {section_word}
[etc.]
0x0000000002524744: add rsp,0x10
0x0000000002524748: pop rbp
0x0000000002524749: test DWORD PTR [rip+0xfffffffffde1b8b1],eax # 0x0000000000340000
; {poll_return}
0x000000000252474f: ret
Run Code Online (Sandbox Code Playgroud)
Vis*_*l K 46
如果将案例值放在较窄的范围内,则开关 - 案例会更快.
case 1:
case 2:
case 3:
..
..
case n:
Run Code Online (Sandbox Code Playgroud)
因为,在这种情况下,编译器可以避免对switch语句中的每个case leg执行比较.编译器创建一个跳转表,其中包含要在不同支路上执行的操作的地址.操作开关的值被操纵以将其转换为索引jump table.在此实现中,switch语句所花费的时间远小于等效的if-else-if语句级联所用的时间.switch语句中所用的时间也与switch语句中的case leg数无关.
正如维基百科中关于编译部分中的switch语句所述.
如果输入值的范围可识别地"小"并且仅具有一些间隙,则包含优化器的一些编译器实际上可以将switch语句实现为分支表或索引函数指针的数组而不是一系列冗长的条件指令.这允许switch语句立即确定要执行的分支,而无需通过比较列表.
Viv*_*ath 30
答案在于字节码:
SwitchTest10.java
public class SwitchTest10 {
public static void main(String[] args) {
int n = 0;
switcher(n);
}
public static void switcher(int n) {
switch(n) {
case 0: System.out.println(0);
break;
case 1: System.out.println(1);
break;
case 2: System.out.println(2);
break;
case 3: System.out.println(3);
break;
case 4: System.out.println(4);
break;
case 5: System.out.println(5);
break;
case 6: System.out.println(6);
break;
case 7: System.out.println(7);
break;
case 8: System.out.println(8);
break;
case 9: System.out.println(9);
break;
case 10: System.out.println(10);
break;
default: System.out.println("test");
}
}
}
Run Code Online (Sandbox Code Playgroud)
对应的字节码; 仅显示相关部分:
public static void switcher(int);
Code:
0: iload_0
1: tableswitch{ //0 to 10
0: 60;
1: 70;
2: 80;
3: 90;
4: 100;
5: 110;
6: 120;
7: 131;
8: 142;
9: 153;
10: 164;
default: 175 }
Run Code Online (Sandbox Code Playgroud)
SwitchTest22.java:
public class SwitchTest22 {
public static void main(String[] args) {
int n = 0;
switcher(n);
}
public static void switcher(int n) {
switch(n) {
case 0: System.out.println(0);
break;
case 1: System.out.println(1);
break;
case 2: System.out.println(2);
break;
case 3: System.out.println(3);
break;
case 4: System.out.println(4);
break;
case 5: System.out.println(5);
break;
case 6: System.out.println(6);
break;
case 7: System.out.println(7);
break;
case 8: System.out.println(8);
break;
case 9: System.out.println(9);
break;
case 100: System.out.println(10);
break;
case 110: System.out.println(10);
break;
case 120: System.out.println(10);
break;
case 130: System.out.println(10);
break;
case 140: System.out.println(10);
break;
case 150: System.out.println(10);
break;
case 160: System.out.println(10);
break;
case 170: System.out.println(10);
break;
case 180: System.out.println(10);
break;
case 190: System.out.println(10);
break;
case 200: System.out.println(10);
break;
case 210: System.out.println(10);
break;
case 220: System.out.println(10);
break;
default: System.out.println("test");
}
}
}
Run Code Online (Sandbox Code Playgroud)
对应的字节码; 再次,只显示相关部分:
public static void switcher(int);
Code:
0: iload_0
1: lookupswitch{ //23
0: 196;
1: 206;
2: 216;
3: 226;
4: 236;
5: 246;
6: 256;
7: 267;
8: 278;
9: 289;
100: 300;
110: 311;
120: 322;
130: 333;
140: 344;
150: 355;
160: 366;
170: 377;
180: 388;
190: 399;
200: 410;
210: 421;
220: 432;
default: 443 }
Run Code Online (Sandbox Code Playgroud)
在第一种情况下,对于窄范围,编译的字节码使用a tableswitch.在第二种情况下,编译的字节码使用a lookupswitch.
在tableswitch,堆栈顶部的整数值用于索引表,以查找分支/跳转目标.然后立即执行该跳转/分支.因此,这是一项O(1)行动.
A lookupswitch更复杂.在这种情况下,需要将整数值与表中的所有键进行比较,直到找到正确的键.找到密钥后,分支/跳转目标(该密钥映射到该目标)用于跳转.使用的表lookupswitch是排序的,并且可以使用二进制搜索算法来查找正确的密钥.二进制搜索的性能是O(log n),并且整个过程也是如此O(log n),因为跳转仍然存在O(1).因此,在稀疏范围的情况下性能较低的原因是必须首先查找正确的密钥,因为您无法直接索引到表中.
如果存在稀疏值并且您只tableswitch需要使用,则表将基本上包含指向该default选项的虚拟条目.例如,假设在最后一项SwitchTest10.java是21,而不是10,您可以:
public static void switcher(int);
Code:
0: iload_0
1: tableswitch{ //0 to 21
0: 104;
1: 114;
2: 124;
3: 134;
4: 144;
5: 154;
6: 164;
7: 175;
8: 186;
9: 197;
10: 219;
11: 219;
12: 219;
13: 219;
14: 219;
15: 219;
16: 219;
17: 219;
18: 219;
19: 219;
20: 219;
21: 208;
default: 219 }
Run Code Online (Sandbox Code Playgroud)
因此编译器基本上创建了这个巨大的表,其中包含间隙之间的虚拟条目,指向default指令的分支目标.即使没有a default,它也会包含指向切换块后指令的条目.我做了一些基本的测试,我发现如果最后一个索引和前一个索引(9)之间的差距大于35,它会使用a lookupswitch代替a tableswitch.
该switch语句的行为在Java虚拟机规范(第3.10节)中定义:
在交换机的情况是稀疏的情况下,tableswitch指令的表表示在空间方面变得低效.可以替代地使用lookupswitch指令.lookupswitch指令将int键(case标签的值)与表中的目标偏移量配对.执行lookupswitch指令时,将交换机表达式的值与表中的键进行比较.如果其中一个键与表达式的值匹配,则在关联的目标偏移处继续执行.如果没有键匹配,则在默认目标处继续执行.[...]
bes*_*sss 19
由于问题已经回答(或多或少),这里有一些提示.使用
private static final double[] mul={1d, 10d...};
static double multiplyByPowerOfTen(final double d, final int exponent) {
if (exponent<0 || exponent>=mul.length) throw new ParseException();//or just leave the IOOBE be
return mul[exponent]*d;
}
Run Code Online (Sandbox Code Playgroud)
该代码使用的IC(指令缓存)明显减少,并且将始终内联.如果代码很热,则阵列将位于L1数据高速缓存中.查询表几乎总是一个胜利.(尤其是微基准测试:D)
编辑:如果您希望该方法是热内联的,请考虑非快速路径throw new ParseException(),如最短或将它们移动到单独的静态方法(因此将它们缩短为最小值).这是throw new ParseException("Unhandled power of ten " + power, 0);一个弱想法b/c它占用了很多可以被解释的代码的内联预算 - 字符串连接在字节码中非常冗长.更多信息和一个真实案例w/ArrayList
| 归档时间: |
|
| 查看次数: |
13169 次 |
| 最近记录: |