Oll*_*ass 13 java performance file-io
正如标题所说,我正在寻找将整数数组写入文件的最快方法.阵列的大小会有所不同,并且实际上可以包含2500到25 000 000个整数.
这是我目前使用的代码:
DataOutputStream writer = new DataOutputStream(new BufferedOutputStream(new FileOutputStream(filename)));
for (int d : data)
writer.writeInt(d);
Run Code Online (Sandbox Code Playgroud)
鉴于DataOutputStream有一个写字节数组的方法,我尝试将int数组转换为字节数组,如下所示:
private static byte[] integersToBytes(int[] values) throws IOException {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
DataOutputStream dos = new DataOutputStream(baos);
for (int i = 0; i < values.length; ++i) {
dos.writeInt(values[i]);
}
return baos.toByteArray();
}
Run Code Online (Sandbox Code Playgroud)
和这样:
private static byte[] integersToBytes2(int[] src) {
int srcLength = src.length;
byte[] dst = new byte[srcLength << 2];
for (int i = 0; i < srcLength; i++) {
int x = src[i];
int j = i << 2;
dst[j++] = (byte) ((x >>> 0) & 0xff);
dst[j++] = (byte) ((x >>> 8) & 0xff);
dst[j++] = (byte) ((x >>> 16) & 0xff);
dst[j++] = (byte) ((x >>> 24) & 0xff);
}
return dst;
}
Run Code Online (Sandbox Code Playgroud)
两者似乎都会提高速度,约为5%.我没有严格测试它们来证实这一点.
是否有任何技术可以加速此文件写入操作,或者有关Java IO写入性能的最佳实践的相关指南?
cle*_*tus 25
我看了三个选项:
DataOutputStream;ObjectOutputStream(Serializable对象,对象int[]); 和FileChannel.结果是
DataOutputStream wrote 1,000,000 ints in 3,159.716 ms
ObjectOutputStream wrote 1,000,000 ints in 295.602 ms
FileChannel wrote 1,000,000 ints in 110.094 ms
Run Code Online (Sandbox Code Playgroud)
所以NIO版本是最快的.它还具有允许编辑的优点,这意味着您可以轻松地更改一个int,而ObjectOutputStream需要读取整个数组,修改它并将其写入文件.
代码如下:
private static final int NUM_INTS = 1000000;
interface IntWriter {
void write(int[] ints);
}
public static void main(String[] args) {
int[] ints = new int[NUM_INTS];
Random r = new Random();
for (int i=0; i<NUM_INTS; i++) {
ints[i] = r.nextInt();
}
time("DataOutputStream", new IntWriter() {
public void write(int[] ints) {
storeDO(ints);
}
}, ints);
time("ObjectOutputStream", new IntWriter() {
public void write(int[] ints) {
storeOO(ints);
}
}, ints);
time("FileChannel", new IntWriter() {
public void write(int[] ints) {
storeFC(ints);
}
}, ints);
}
private static void time(String name, IntWriter writer, int[] ints) {
long start = System.nanoTime();
writer.write(ints);
long end = System.nanoTime();
double ms = (end - start) / 1000000d;
System.out.printf("%s wrote %,d ints in %,.3f ms%n", name, ints.length, ms);
}
private static void storeOO(int[] ints) {
ObjectOutputStream out = null;
try {
out = new ObjectOutputStream(new FileOutputStream("object.out"));
out.writeObject(ints);
} catch (IOException e) {
throw new RuntimeException(e);
} finally {
safeClose(out);
}
}
private static void storeDO(int[] ints) {
DataOutputStream out = null;
try {
out = new DataOutputStream(new FileOutputStream("data.out"));
for (int anInt : ints) {
out.write(anInt);
}
} catch (IOException e) {
throw new RuntimeException(e);
} finally {
safeClose(out);
}
}
private static void storeFC(int[] ints) {
FileOutputStream out = null;
try {
out = new FileOutputStream("fc.out");
FileChannel file = out.getChannel();
ByteBuffer buf = file.map(FileChannel.MapMode.READ_WRITE, 0, 4 * ints.length);
for (int i : ints) {
buf.putInt(i);
}
file.close();
} catch (IOException e) {
throw new RuntimeException(e);
} finally {
safeClose(out);
}
}
private static void safeClose(OutputStream out) {
try {
if (out != null) {
out.close();
}
} catch (IOException e) {
// do nothing
}
}
Run Code Online (Sandbox Code Playgroud)
我会用FileChannel从NIO包和ByteBuffer.这种方法似乎(在我的计算机上)提高了2到4倍的写入性能:
程序输出:
normal time: 2555
faster time: 765
Run Code Online (Sandbox Code Playgroud)
这是该计划:
public class Test {
public static void main(String[] args) throws IOException {
// create a test buffer
ByteBuffer buffer = createBuffer();
long start = System.currentTimeMillis();
{
// do the first test (the normal way of writing files)
normalToFile(new File("first"), buffer.asIntBuffer());
}
long middle = System.currentTimeMillis();
{
// use the faster nio stuff
fasterToFile(new File("second"), buffer);
}
long done = System.currentTimeMillis();
// print the result
System.out.println("normal time: " + (middle - start));
System.out.println("faster time: " + (done - middle));
}
private static void fasterToFile(File file, ByteBuffer buffer)
throws IOException {
FileChannel fc = null;
try {
fc = new FileOutputStream(file).getChannel();
fc.write(buffer);
} finally {
if (fc != null)
fc.close();
buffer.rewind();
}
}
private static void normalToFile(File file, IntBuffer buffer)
throws IOException {
DataOutputStream writer = null;
try {
writer =
new DataOutputStream(new BufferedOutputStream(
new FileOutputStream(file)));
while (buffer.hasRemaining())
writer.writeInt(buffer.get());
} finally {
if (writer != null)
writer.close();
buffer.rewind();
}
}
private static ByteBuffer createBuffer() {
ByteBuffer buffer = ByteBuffer.allocate(4 * 25000000);
Random r = new Random(1);
while (buffer.hasRemaining())
buffer.putInt(r.nextInt());
buffer.rewind();
return buffer;
}
}
Run Code Online (Sandbox Code Playgroud)
基准测试应该每隔一段时间重复一次,不是吗?:) 修复了一些错误并添加了我自己的编写变体后,以下是我在运行 Windows 10 的 ASUS ZenBook UX305 上运行基准测试时得到的结果(时间以秒为单位):
Running tests... 0 1 2
Buffered DataOutputStream 8,14 8,46 8,30
FileChannel alt2 1,55 1,18 1,12
ObjectOutputStream 9,60 10,41 11,68
FileChannel 1,49 1,20 1,21
FileChannel alt 5,49 4,58 4,66
Run Code Online (Sandbox Code Playgroud)
以下是在同一台计算机上运行的结果,但使用 Arch Linux 并且交换了写入方法的顺序:
Running tests... 0 1 2
Buffered DataOutputStream 31,16 6,29 7,26
FileChannel 1,07 0,83 0,82
FileChannel alt2 1,25 1,71 1,42
ObjectOutputStream 3,47 5,39 4,40
FileChannel alt 2,70 3,27 3,46
Run Code Online (Sandbox Code Playgroud)
每个测试写入一个 800mb 的文件。无缓冲的 DataOutputStream 运行时间太长,因此我将其从基准测试中排除。
可以看出,使用文件通道进行写入仍然胜过所有其他方法,但字节缓冲区是否是内存映射的非常重要。如果没有内存映射,文件通道写入需要 3-5 秒:
var bb = ByteBuffer.allocate(4 * ints.length);
for (int i : ints)
bb.putInt(i);
bb.flip();
try (var fc = new FileOutputStream("fcalt.out").getChannel()) {
fc.write(bb);
}
Run Code Online (Sandbox Code Playgroud)
通过内存映射,时间减少到 0.8 到 1.5 秒之间:
try (var fc = new RandomAccessFile("fcalt2.out", "rw").getChannel()) {
var bb = fc.map(READ_WRITE, 0, 4 * ints.length);
bb.asIntBuffer().put(ints);
}
Run Code Online (Sandbox Code Playgroud)
但请注意,结果与顺序相关。尤其是在 Linux 上更是如此。内存映射方法似乎不会完整写入数据,而是将作业请求卸载到操作系统并在完成之前返回。这种行为是否可取取决于具体情况。
内存映射还可能导致内存不足问题,因此它并不总是正确的工具。使用 java.nio.MappedByteBuffer 时防止 OutOfMemory。
这是我的基准代码版本: https://gist.github.com/bjourne/53b7eabc6edea27ffb042e7816b7830b
| 归档时间: |
|
| 查看次数: |
21470 次 |
| 最近记录: |