在Java中逐行读取和写入大文件的最快方法

use*_*771 19 java performance file-io bufferedreader

我一直在寻找最快的方法来读取和写入具有有限内存(约64MB)的java中的大文件(0.5 - 1 GB).文件中的每一行代表一条记录,所以我需要逐行获取它们.该文件是普通文本文件.

我尝试过BufferedReader和BufferedWriter但它似乎不是最好的选择.读取和写入大小为0.5 GB的文件大约需要35秒,只读取没有处理的写入.我认为这里的瓶颈是写作,因为单独阅读大约需要10秒钟.

我试图读取字节数组,但是在每个读取的数组中搜索行需要更多时间.

有什么建议吗?谢谢

Pet*_*rey 16

我怀疑你真正的问题是,你只有有限的硬件和你做的是软件没有太大的差别.如果你有足够的内存和CPU,更高级的技巧可以帮助,但如果你只是在硬盘上等待,因为该文件没有被缓存,它不会太大的差别.

顺便说一句:500 MB在10秒或50 MB /秒为HDD典型的读速度.

尝试运行以下命令以查看系统无法有效缓存文件的位置.

public static void main(String... args) throws IOException {
    for (int mb : new int[]{50, 100, 250, 500, 1000, 2000})
        testFileSize(mb);
}

private static void testFileSize(int mb) throws IOException {
    File file = File.createTempFile("test", ".txt");
    file.deleteOnExit();
    char[] chars = new char[1024];
    Arrays.fill(chars, 'A');
    String longLine = new String(chars);
    long start1 = System.nanoTime();
    PrintWriter pw = new PrintWriter(new FileWriter(file));
    for (int i = 0; i < mb * 1024; i++)
        pw.println(longLine);
    pw.close();
    long time1 = System.nanoTime() - start1;
    System.out.printf("Took %.3f seconds to write to a %d MB, file rate: %.1f MB/s%n",
            time1 / 1e9, file.length() >> 20, file.length() * 1000.0 / time1);

    long start2 = System.nanoTime();
    BufferedReader br = new BufferedReader(new FileReader(file));
    for (String line; (line = br.readLine()) != null; ) {
    }
    br.close();
    long time2 = System.nanoTime() - start2;
    System.out.printf("Took %.3f seconds to read to a %d MB file, rate: %.1f MB/s%n",
            time2 / 1e9, file.length() >> 20, file.length() * 1000.0 / time2);
    file.delete();
}
Run Code Online (Sandbox Code Playgroud)

在具有大量内存的Linux机器上.

Took 0.395 seconds to write to a 50 MB, file rate: 133.0 MB/s
Took 0.375 seconds to read to a 50 MB file, rate: 140.0 MB/s
Took 0.669 seconds to write to a 100 MB, file rate: 156.9 MB/s
Took 0.569 seconds to read to a 100 MB file, rate: 184.6 MB/s
Took 1.585 seconds to write to a 250 MB, file rate: 165.5 MB/s
Took 1.274 seconds to read to a 250 MB file, rate: 206.0 MB/s
Took 2.513 seconds to write to a 500 MB, file rate: 208.8 MB/s
Took 2.332 seconds to read to a 500 MB file, rate: 225.1 MB/s
Took 5.094 seconds to write to a 1000 MB, file rate: 206.0 MB/s
Took 5.041 seconds to read to a 1000 MB file, rate: 208.2 MB/s
Took 11.509 seconds to write to a 2001 MB, file rate: 182.4 MB/s
Took 9.681 seconds to read to a 2001 MB file, rate: 216.8 MB/s
Run Code Online (Sandbox Code Playgroud)

在具有大量内存的Windows机器上.

Took 0.376 seconds to write to a 50 MB, file rate: 139.7 MB/s
Took 0.401 seconds to read to a 50 MB file, rate: 131.1 MB/s
Took 0.517 seconds to write to a 100 MB, file rate: 203.1 MB/s
Took 0.520 seconds to read to a 100 MB file, rate: 201.9 MB/s
Took 1.344 seconds to write to a 250 MB, file rate: 195.4 MB/s
Took 1.387 seconds to read to a 250 MB file, rate: 189.4 MB/s
Took 2.368 seconds to write to a 500 MB, file rate: 221.8 MB/s
Took 2.454 seconds to read to a 500 MB file, rate: 214.1 MB/s
Took 4.985 seconds to write to a 1001 MB, file rate: 210.7 MB/s
Took 5.132 seconds to read to a 1001 MB file, rate: 204.7 MB/s
Took 10.276 seconds to write to a 2003 MB, file rate: 204.5 MB/s
Took 9.964 seconds to read to a 2003 MB file, rate: 210.9 MB/s
Run Code Online (Sandbox Code Playgroud)

  • 这种基准测试的结果或多或少都是无用的.首先,在写入文件时,关闭输出流并不能确保所有数据都已物理写入磁盘.它可能仍然潜伏在操作系统级别或硬盘上的内存缓冲区中.如果在编写后直接读取完全相同的文件,则数据很可能是从内存缓冲区中读取的,而不是从磁盘上读取的.根据这个基准测试,我的笔记本电脑硬盘的读写速度接近500MB/s,这可能是真正性能的10倍左右. (4认同)
  • 你的评论很棒,jambjo.很高兴看到你的回答,因为你显然是知识渊博的. (3认同)

jar*_*bjo 9

我要尝试的第一件事是增加BufferedReader和BufferedWriter的缓冲区大小.默认的缓冲区大小没有记录,但至少在Oracle VM中它们是8192个字符,这不会带来太多的性能优势.

如果您只需要复制文件(并且不需要实际访问数据),我会删除Reader/Writer方法,并使用字节数组作为缓冲区直接使用InputStream和OutputStream:

FileInputStream fis = new FileInputStream("d:/test.txt");
FileOutputStream fos = new FileOutputStream("d:/test2.txt");
byte[] b = new byte[bufferSize];
int r;
while ((r=fis.read(b))>=0) {
    fos.write(b, 0, r);         
}
fis.close();
fos.close();
Run Code Online (Sandbox Code Playgroud)

或实际使用NIO:

FileChannel in = new RandomAccessFile("d:/test.txt", "r").getChannel();
FileChannel out = new RandomAccessFile("d:/test2.txt", "rw").getChannel();
out.transferFrom(in, 0, Long.MAX_VALUE);
in.close();
out.close();
Run Code Online (Sandbox Code Playgroud)

在对不同的复制方法进行基准测试时,我在每次运行基准测试之间的差异(持续时间)要大于不同实现之间的差异.I/O缓存(在操作系统级别和硬盘缓存上)在这里发挥了重要作用,很难说什么更快.在我的硬件上,使用BufferedReader和BufferedWriter逐行复制1GB文本文件在某些​​运行中需要少于5秒,在其他运行中需要超过30秒.