使用访问器/ getter可以更快?

Hbf*_*Hbf 0 java performance jvm-hotspot

背景:我打算将我用 C++ 编写的库移植到Java.代码处理d维点的大小为n的列表,需要计算标量产品等.我想让我的代码独立于点的存储格式,并为此引入了一个接口,

public interface PointSetAccessor
{
  float coord(int p, int c);
}
Run Code Online (Sandbox Code Playgroud)

允许我获得Ç个坐标(0≤ ç < d的的)p个点(0≤ p < Ñ).

问题:由于代码必须是非常快,我想知道是否这会减慢性能,与此相反的是直访问模式等points[p][c],其中,points是的阵列Ñ阵列,其中的每一个保持d点坐标.

令人惊讶的是,情况恰恰相反:代码(见下文)通过a的"间接"访问速度提高20%PointSetAccessor.(我用这个来衡量time java -server -XX:+AggressiveOpts -cp bin Speedo,前者为14s,后者为11s.)

问题:知道为什么会这样吗?好像Hotspot决定更积极地进行优化,或者在后一版本中有更大的自由度?

代码(计算无意义):

public class Speedo
{
  public interface PointSetAccessor
  {
    float coord(int p, int c);
  }

  public static final class ArrayPointSetAccessor implements PointSetAccessor
  {
    private final float[][] array;

    public ArrayPointSetAccessor(float[][] array)
    {
      this.array = array;
    }

    public float coord(int point, int dim)
    {
      return array[point][dim];
    }
  }

  public static void main(String[] args)
  {
    final int n = 50000;
    final int d = 10;

    // Generate n points in dimension d
    final java.util.Random r = new java.util.Random(314);
    final float[][] a = new float[n][d];
    for (int i = 0; i < n; ++i)
      for (int j = 0; j < d; ++j)
        a[i][j] = r.nextFloat();

    float result = 0.0f;
    if (true)
    {
      // Direct version
      for (int i = 0; i < n; i++)
        for (int j = i + 1; j < n; ++j)
        {
          float prod = 0.0f;
          for (int k = 0; k < d; ++k)
            prod += a[i][k] * a[j][k];
          result += prod;
        }
    }
    else
    {
      // Accessor-based version
      final PointSetAccessor ac = new ArrayPointSetAccessor(a);
      for (int i = 0; i < n; i++)
        for (int j = i + 1; j < n; ++j)
        {
          result += product(ac, d, i, j);
        }
    }
    System.out.println("result = " + result);
  }

  private final static float product(PointSetAccessor ac, int d, int i, int j)
  {
    float prod = 0.0f;
    for (int k = 0; k < d; ++k)
      prod += ac.coord(i, k) * ac.coord(j, k);
    return prod;
  }
}
Run Code Online (Sandbox Code Playgroud)

ass*_*ias 5

如此短的方法,如果是热的(所谓的10000次以上使用默认设置),将被热点内联,所以你不应该注意到的性能差异(你衡量性能忽略了许多影响,如预热时间的方式例如,这可能导致错误的结果).

在运行代码并询问热点以显示内联(-server -XX:+UnlockDiagnosticVMOptions -XX:+PrintCompilation -XX:+PrintInlining)的内容时,您将获得下面的输出,其中显示了两者coordproduct获得内联:

 76    1 %           javaapplication27.Speedo::main @ -2 (163 bytes)   made not entrant
 77    6             javaapplication27.Speedo$ArrayPointSetAccessor::coord (9 bytes)
 78    7             javaapplication27.Speedo::product (45 bytes)
                        @ 18   javaapplication27.Speedo$ArrayPointSetAccessor::coord (9 bytes)   inline (hot)
                        @ 27   javaapplication27.Speedo$ArrayPointSetAccessor::coord (9 bytes)   inline (hot)
 80    2 %           javaapplication27.Speedo::main @ 101 (163 bytes)
                        @ 118   javaapplication27.Speedo::product (45 bytes)   inline (hot)
                          @ 18   javaapplication27.Speedo$ArrayPointSetAccessor::coord (9 bytes)   inline (hot)
                          @ 27   javaapplication27.Speedo$ArrayPointSetAccessor::coord (9 bytes)   inline (hot)
Run Code Online (Sandbox Code Playgroud)