超快速舍入功能(PBC)

Question

我真的需要在C中使用非常快的round()函数 - 它对于蒙特卡罗粒子建模是必要的:在每一步都需要将坐标包装到周期框中以计算体积交互:例如

for(int i=0; i < 3; i++)
{
    coor.x[i] = a.XReal.x[i]-b.XReal.x[i];
    coor.x[i] = coor.x[i] - SIZE[i]*round(coor.x[i]/SIZE[i]); //PBC
}

我遇到过一些asm hacking with it,但我根本不理解asm :)这样的事情

inline int float2int2(float flt)
{
  int intgr;

  __asm__ __volatile__ ("fld %1; fistp %0;" : "=m" (intgr) : "m" (flt));

  return intgr;
}

对于固定边界,没有round(),它的工作速度更快.那么,也许有人知道更好的方式？...

Answer 1

首先，您可以通过使用正确的编译器选项获得一些收益。以 GCC 和现代 Intel CPU 为例，您应该尝试：

-march=nehalem -fno-trapping-math

那么问题round是它使用特定的舍入模式，这在大多数平台上都很慢。nearbyint(或rint) 应该总是更快：

coor.x[i] = coor.x[i] - SIZE[i] * nearbyint(coor.x[i] / SIZE[i])

您还应该考虑对代码进行矢量化。