比较直接和非直接ByteBuffer get/put操作

use*_*659 10 java memory nio bytebuffer

从非直接bytebuffer获取/放置比直接bytebuffer中的get/put快吗?

如果我必须从直接字节缓冲区读/写,最好先读取/写入线程本地字节数组,然后使用字节数组更新(写入)直接字节缓冲区吗?

Pet*_*rey 23

从非直接bytebuffer获取/放置比直接bytebuffer中的get/put快吗?

如果您正在将堆缓冲区与不使用本机字节顺序的直接缓冲区进行比较(大多数系统都是小端,而直接ByteBuffer的默认值是big endian),则性能非常相似.

如果使用本机有序字节缓冲区,则对于多字节值,性能可能会明显提高.因为byte无论你做什么都没什么区别.

在HotSpot/OpenJDK中,ByteBuffer使用Unsafe类,许多native方法被视为内在函数.这是依赖于JVM的,而AFAIK是Android VM在最近的版本中将其视为内在的.

如果转储生成的程序集,您可以在一个机器代码指令中看到Unsafe中的内在函数.即他们没有JNI呼叫的开销.

实际上,如果您进行微调,您可能会发现ByteBuffer getXxxx或setXxxx的大部分时间都花在边界检查上,而不是实际的内存访问.出于这个原因,我仍然需要直接使用Unsafe 获得最大性能(注意:Oracle不鼓励这样做)

如果我必须从直接字节缓冲区读/写,最好先读取/写入线程本地字节数组,然后使用字节数组更新(写入)直接字节缓冲区吗?

我不想看到什么比这更好.;)听起来很复杂.

通常最简单的解决方案更好,更快.


您可以使用此代码自行测试.

public static void main(String... args) {
    ByteBuffer bb1 = ByteBuffer.allocateDirect(256 * 1024).order(ByteOrder.nativeOrder());
    ByteBuffer bb2 = ByteBuffer.allocateDirect(256 * 1024).order(ByteOrder.nativeOrder());
    for (int i = 0; i < 10; i++)
        runTest(bb1, bb2);
}

private static void runTest(ByteBuffer bb1, ByteBuffer bb2) {
    bb1.clear();
    bb2.clear();
    long start = System.nanoTime();
    int count = 0;
    while (bb2.remaining() > 0)
        bb2.putInt(bb1.getInt());
    long time = System.nanoTime() - start;
    int operations = bb1.capacity() / 4 * 2;
    System.out.printf("Each putInt/getInt took an average of %.1f ns%n", (double) time / operations);
}
Run Code Online (Sandbox Code Playgroud)

版画

Each putInt/getInt took an average of 83.9 ns
Each putInt/getInt took an average of 1.4 ns
Each putInt/getInt took an average of 34.7 ns
Each putInt/getInt took an average of 1.3 ns
Each putInt/getInt took an average of 1.2 ns
Each putInt/getInt took an average of 1.3 ns
Each putInt/getInt took an average of 1.2 ns
Each putInt/getInt took an average of 1.2 ns
Each putInt/getInt took an average of 1.2 ns
Each putInt/getInt took an average of 1.2 ns
Run Code Online (Sandbox Code Playgroud)

我很确定JNI调用的时间超过1.2 ns.


为了证明它不是"JNI"的召唤,而是围绕它的guff导致延迟.您可以直接使用Unsafe编写相同的循环.

public static void main(String... args) {
    ByteBuffer bb1 = ByteBuffer.allocateDirect(256 * 1024).order(ByteOrder.nativeOrder());
    ByteBuffer bb2 = ByteBuffer.allocateDirect(256 * 1024).order(ByteOrder.nativeOrder());
    for (int i = 0; i < 10; i++)
        runTest(bb1, bb2);
}

private static void runTest(ByteBuffer bb1, ByteBuffer bb2) {
    Unsafe unsafe = getTheUnsafe();
    long start = System.nanoTime();
    long addr1 = ((DirectBuffer) bb1).address();
    long addr2 = ((DirectBuffer) bb2).address();
    for (int i = 0, len = Math.min(bb1.capacity(), bb2.capacity()); i < len; i += 4)
        unsafe.putInt(addr1 + i, unsafe.getInt(addr2 + i));
    long time = System.nanoTime() - start;
    int operations = bb1.capacity() / 4 * 2;
    System.out.printf("Each putInt/getInt took an average of %.1f ns%n", (double) time / operations);
}

public static Unsafe getTheUnsafe() {
    try {
        Field theUnsafe = Unsafe.class.getDeclaredField("theUnsafe");
        theUnsafe.setAccessible(true);
        return (Unsafe) theUnsafe.get(null);
    } catch (Exception e) {
        throw new AssertionError(e);
    }
}
Run Code Online (Sandbox Code Playgroud)

版画

Each putInt/getInt took an average of 40.4 ns
Each putInt/getInt took an average of 44.4 ns
Each putInt/getInt took an average of 0.4 ns
Each putInt/getInt took an average of 0.3 ns
Each putInt/getInt took an average of 0.3 ns
Each putInt/getInt took an average of 0.3 ns
Each putInt/getInt took an average of 0.3 ns
Each putInt/getInt took an average of 0.3 ns
Each putInt/getInt took an average of 0.3 ns
Each putInt/getInt took an average of 0.3 ns
Run Code Online (Sandbox Code Playgroud)

因此,您可以看到该native调用比您对JNI调用的预期要快得多.这种延迟的主要原因可能是L2缓存速度.;)

全部运行在i3 3.3 GHz上

  • 事实上我已经使用Unsafe故意使系统崩溃,例如我想测试如果应用程序崩溃**会发生什么**;) (3认同)