有人可以帮助我理解unsigned long long在性能方面使用asm块进行乘法的好处.它与竞争性编程优化有关.我想它会使乘法更快,但我实际上无法理解代码.
const int md = 998244353;
inline int mul(int a, int b)
{
#if !defined(_WIN32) || defined(_WIN64)
return (int) ((long long) a * b % md);
#endif
unsigned long long x = (long long) a * b;
unsigned xh = (unsigned) (x >> 32), xl = (unsigned) x, d, m;
asm(
"divl %4; \n\t"
: "=a" (d), "=d" (m)
: "d" (xh), "a" (xl), "r" (md)
);
return m;
}
Run Code Online (Sandbox Code Playgroud)