Currently, from research and various attempts, I'm pretty sure that the only solution to this problem is to use assembly. I'm posting this question to show an existing problem, and maybe get attention from compiler developers, or get some hits from searches about similar problems.
If anything changes in the future, I will accept it as an answer.
This is a very related question for MSVC.
In x86_64 machines, it is faster to use div/idiv with a 32-bit …
可以通过硬件128bit / 64bit除法指令执行缩放的64bit / 32bit除法,例如:
; Entry arguments: Dividend in EAX, Divisor in EBX
shl rax, 32 ;Scale up the Dividend by 2^32
xor rdx,rdx
and rbx, 0xFFFFFFFF ;Clear any garbage that might have been in the upper half of RBX
div rbx ; RAX = RDX:RAX / RBX
Run Code Online (Sandbox Code Playgroud)
...在某些特殊情况下,比硬件64位/ 32位除法指令执行的缩放64位/ 32位除法更快,例如:
; Entry arguments: Dividend in EAX, Divisor in EBX
mov edx,eax ;Scale up the Dividend by 2^32
xor eax,eax
div ebx ; EAX = EDX:EAX / EBX
Run Code Online (Sandbox Code Playgroud)
“某些特殊情况”是指异常的红利和除数。我只想比较 …