Hbf*_*Hbf 0 java performance jvm-hotspot
背景:我打算将我用 C++ 编写的库移植到Java.代码处理d维点的大小为n的列表,需要计算标量产品等.我想让我的代码独立于点的存储格式,并为此引入了一个接口,
public interface PointSetAccessor
{
float coord(int p, int c);
}
Run Code Online (Sandbox Code Playgroud)
允许我获得Ç个坐标(0≤ ç < d的的)p个点(0≤ p < Ñ).
问题:由于代码必须是非常快,我想知道是否这会减慢性能,与此相反的是直访问模式等points[p][c],其中,points是的阵列Ñ阵列,其中的每一个保持d点坐标.
令人惊讶的是,情况恰恰相反:代码(见下文)通过a的"间接"访问速度提高了20%PointSetAccessor.(我用这个来衡量time java -server -XX:+AggressiveOpts -cp bin Speedo,前者为14s,后者为11s.)
问题:知道为什么会这样吗?好像Hotspot决定更积极地进行优化,或者在后一版本中有更大的自由度?
代码(计算无意义):
public class Speedo
{
public interface PointSetAccessor
{
float coord(int p, int c);
}
public static final class ArrayPointSetAccessor implements PointSetAccessor
{
private final float[][] array;
public ArrayPointSetAccessor(float[][] array)
{
this.array = array;
}
public float coord(int point, int dim)
{
return array[point][dim];
}
}
public static void main(String[] args)
{
final int n = 50000;
final int d = 10;
// Generate n points in dimension d
final java.util.Random r = new java.util.Random(314);
final float[][] a = new float[n][d];
for (int i = 0; i < n; ++i)
for (int j = 0; j < d; ++j)
a[i][j] = r.nextFloat();
float result = 0.0f;
if (true)
{
// Direct version
for (int i = 0; i < n; i++)
for (int j = i + 1; j < n; ++j)
{
float prod = 0.0f;
for (int k = 0; k < d; ++k)
prod += a[i][k] * a[j][k];
result += prod;
}
}
else
{
// Accessor-based version
final PointSetAccessor ac = new ArrayPointSetAccessor(a);
for (int i = 0; i < n; i++)
for (int j = i + 1; j < n; ++j)
{
result += product(ac, d, i, j);
}
}
System.out.println("result = " + result);
}
private final static float product(PointSetAccessor ac, int d, int i, int j)
{
float prod = 0.0f;
for (int k = 0; k < d; ++k)
prod += ac.coord(i, k) * ac.coord(j, k);
return prod;
}
}
Run Code Online (Sandbox Code Playgroud)
如此短的方法,如果是热的(所谓的10000次以上使用默认设置),将被热点内联,所以你不应该注意到的性能差异(你衡量性能忽略了许多影响,如预热时间的方式例如,这可能导致错误的结果).
在运行代码并询问热点以显示内联(-server -XX:+UnlockDiagnosticVMOptions -XX:+PrintCompilation -XX:+PrintInlining)的内容时,您将获得下面的输出,其中显示了两者coord并product获得内联:
76 1 % javaapplication27.Speedo::main @ -2 (163 bytes) made not entrant
77 6 javaapplication27.Speedo$ArrayPointSetAccessor::coord (9 bytes)
78 7 javaapplication27.Speedo::product (45 bytes)
@ 18 javaapplication27.Speedo$ArrayPointSetAccessor::coord (9 bytes) inline (hot)
@ 27 javaapplication27.Speedo$ArrayPointSetAccessor::coord (9 bytes) inline (hot)
80 2 % javaapplication27.Speedo::main @ 101 (163 bytes)
@ 118 javaapplication27.Speedo::product (45 bytes) inline (hot)
@ 18 javaapplication27.Speedo$ArrayPointSetAccessor::coord (9 bytes) inline (hot)
@ 27 javaapplication27.Speedo$ArrayPointSetAccessor::coord (9 bytes) inline (hot)
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
193 次 |
| 最近记录: |