ami*_*mit 6 java memory performance caching garbage-collection
假设我有一大堆相对较小的对象,我需要经常迭代.
我想通过提高缓存性能来优化我的迭代,所以我想在内存上连续分配对象 [而不是引用],这样我就可以减少缓存未命中数,并且整体性能可能会更好.
在C++中,我可以只分配一个对象数组,它会按照我的意愿分配它们,但在java中 - 在分配数组时,我只分配引用,并且一次只分配一个对象.
我知道如果我"一次"分配对象[一个接一个],jvm 最有可能将对象分配为尽可能连续,但如果内存是碎片的话可能还不够.
我的问题:
Pet*_*rey 12
新对象正在伊甸园空间中创建.伊甸园空间永远不会分散.GC后它总是空的.
你遇到的问题是当执行GC时,对象可以随机排列在内存中,甚至可以按照相反的顺序排列.
解决方法是将字段存储为一系列数组.我称之为基于列的表而不是基于行的表.
而不是写作
class PointCount {
double x, y;
int count;
}
PointCount[] pc = new lots of small objects.
Run Code Online (Sandbox Code Playgroud)
使用基于列的数据类型.
class PointCounts {
double[] xs, ys;
int[] counts;
}
Run Code Online (Sandbox Code Playgroud)
要么
class PointCounts {
TDoubleArrayList xs, ys;
TIntArrayList counts;
}
Run Code Online (Sandbox Code Playgroud)
阵列本身可以在多达三个不同的位置,但数据总是连续的.如果您对字段子集执行操作,这甚至可以稍微提高效率.
public int totalCount() {
int sum = 0;
// counts are continuous without anything between the values.
for(int i: counts) sum += i;
return i;
}
Run Code Online (Sandbox Code Playgroud)
我使用的解决方案是避免GC开销,因为大量数据是使用接口来访问直接或内存映射的ByteBuffer
import java.nio.ByteBuffer;
import java.nio.ByteOrder;
public class MyCounters {
public static void main(String... args) {
Runtime rt = Runtime.getRuntime();
long used1 = rt.totalMemory() - rt.freeMemory();
long start = System.nanoTime();
int length = 100 * 1000 * 1000;
PointCount pc = new PointCountImpl(length);
for (int i = 0; i < length; i++) {
pc.index(i);
pc.setX(i);
pc.setY(-i);
pc.setCount(1);
}
for (int i = 0; i < length; i++) {
pc.index(i);
if (pc.getX() != i) throw new AssertionError();
if (pc.getY() != -i) throw new AssertionError();
if (pc.getCount() != 1) throw new AssertionError();
}
long time = System.nanoTime() - start;
long used2 = rt.totalMemory() - rt.freeMemory();
System.out.printf("Creating an array of %,d used %,d bytes of heap and tool %.1f seconds to set and get%n",
length, (used2 - used1), time / 1e9);
}
}
interface PointCount {
// set the index of the element referred to.
public void index(int index);
public double getX();
public void setX(double x);
public double getY();
public void setY(double y);
public int getCount();
public void setCount(int count);
public void incrementCount();
}
class PointCountImpl implements PointCount {
static final int X_OFFSET = 0;
static final int Y_OFFSET = X_OFFSET + 8;
static final int COUNT_OFFSET = Y_OFFSET + 8;
static final int LENGTH = COUNT_OFFSET + 4;
final ByteBuffer buffer;
int start = 0;
PointCountImpl(int count) {
this(ByteBuffer.allocateDirect(count * LENGTH).order(ByteOrder.nativeOrder()));
}
PointCountImpl(ByteBuffer buffer) {
this.buffer = buffer;
}
@Override
public void index(int index) {
start = index * LENGTH;
}
@Override
public double getX() {
return buffer.getDouble(start + X_OFFSET);
}
@Override
public void setX(double x) {
buffer.putDouble(start + X_OFFSET, x);
}
@Override
public double getY() {
return buffer.getDouble(start + Y_OFFSET);
}
@Override
public void setY(double y) {
buffer.putDouble(start + Y_OFFSET, y);
}
@Override
public int getCount() {
return buffer.getInt(start + COUNT_OFFSET);
}
@Override
public void setCount(int count) {
buffer.putInt(start + COUNT_OFFSET, count);
}
@Override
public void incrementCount() {
setCount(getCount() + 1);
}
}
Run Code Online (Sandbox Code Playgroud)
运行-XX:-UseTLAB选项(以获得准确的内存分配大小)打印
创建一个100,000,000的数组,使用了12,512个字节的堆,并花了1.8秒来设置和获取
作为它的off堆,它几乎没有GC影响.