在文本文件Java中编写大量数据的最快方法

Rak*_*yal 64 java file resultset

我必须在text [csv]文件中写入大量数据.我使用BufferedWriter来写入数据,大约需要40秒来写入174 MB的数据.这是java提供的最快速度吗?

bufferedWriter = new BufferedWriter ( new FileWriter ( "fileName.csv" ) );
Run Code Online (Sandbox Code Playgroud)

注意:这40秒包括迭代和从结果集中提取记录的时间.:).174 MB是结果集中的400000行.

Dav*_*les 96

您可以尝试删除BufferedWriter并直接使用FileWriter.在现代系统中,你很可能只是写入驱动器的高速缓冲存储器.

这需要我在4-5秒的范围内写入175MB(400万字符串) - 这是在双核2.4GHz戴尔运行Windows XP和80GB,7200-RPM Hitachi磁盘上.

你能分辨出多少时间是记录检索和文件写入多少?

import java.io.BufferedWriter;
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
import java.io.Writer;
import java.util.ArrayList;
import java.util.List;

public class FileWritingPerfTest {


private static final int ITERATIONS = 5;
private static final double MEG = (Math.pow(1024, 2));
private static final int RECORD_COUNT = 4000000;
private static final String RECORD = "Help I am trapped in a fortune cookie factory\n";
private static final int RECSIZE = RECORD.getBytes().length;

public static void main(String[] args) throws Exception {
    List<String> records = new ArrayList<String>(RECORD_COUNT);
    int size = 0;
    for (int i = 0; i < RECORD_COUNT; i++) {
        records.add(RECORD);
        size += RECSIZE;
    }
    System.out.println(records.size() + " 'records'");
    System.out.println(size / MEG + " MB");

    for (int i = 0; i < ITERATIONS; i++) {
        System.out.println("\nIteration " + i);

        writeRaw(records);
        writeBuffered(records, 8192);
        writeBuffered(records, (int) MEG);
        writeBuffered(records, 4 * (int) MEG);
    }
}

private static void writeRaw(List<String> records) throws IOException {
    File file = File.createTempFile("foo", ".txt");
    try {
        FileWriter writer = new FileWriter(file);
        System.out.print("Writing raw... ");
        write(records, writer);
    } finally {
        // comment this out if you want to inspect the files afterward
        file.delete();
    }
}

private static void writeBuffered(List<String> records, int bufSize) throws IOException {
    File file = File.createTempFile("foo", ".txt");
    try {
        FileWriter writer = new FileWriter(file);
        BufferedWriter bufferedWriter = new BufferedWriter(writer, bufSize);

        System.out.print("Writing buffered (buffer size: " + bufSize + ")... ");
        write(records, bufferedWriter);
    } finally {
        // comment this out if you want to inspect the files afterward
        file.delete();
    }
}

private static void write(List<String> records, Writer writer) throws IOException {
    long start = System.currentTimeMillis();
    for (String record: records) {
        writer.write(record);
    }
    writer.flush();
    writer.close();
    long end = System.currentTimeMillis();
    System.out.println((end - start) / 1000f + " seconds");
}
}
Run Code Online (Sandbox Code Playgroud)

  • FWIW,这是为Java 5编写的,至少没有记录在关闭时刷新,并且没有尝试使用资源.它可能会使用更新. (3认同)
  • @rozario每个写入调用应该只生成大约175MB然后删除自己.如果没有,你最终会得到175MB x 4个不同的写入调用x 5次迭代= 3.5GB的数据.你可以检查file.delete()的返回值,如果它是false,则抛出异常. (2认同)
  • 我刚刚查阅了 `Writer.flush()` 的 Java 1.1 文档,它说“*关闭流,先刷新它。*”。因此,永远不需要在“close()”之前调用“flush()”。顺便说一句,“BufferedWriter”可能无用的原因之一是“FileWriter”(“OutputStreamWriter”的特化)在将字符序列转换为字节序列时无论如何都必须有自己的缓冲。目标编码。当字符集编码器必须以更高的速率刷新其较小的字节缓冲区时,在前端拥有更多的缓冲区并没有帮助。 (2认同)

Dee*_*wal 36

尝试内存映射文件(在我的m/c,core 2 duo,2.5GB RAM中需要300 m/s写入174MB):

byte[] buffer = "Help I am trapped in a fortune cookie factory\n".getBytes();
int number_of_lines = 400000;

FileChannel rwChannel = new RandomAccessFile("textfile.txt", "rw").getChannel();
ByteBuffer wrBuf = rwChannel.map(FileChannel.MapMode.READ_WRITE, 0, buffer.length * number_of_lines);
for (int i = 0; i < number_of_lines; i++)
{
    wrBuf.put(buffer);
}
rwChannel.close();
Run Code Online (Sandbox Code Playgroud)

  • Jut fyi,在MacBook Pro上运行(2013年末),2.6 Ghz Core i7,Apple 1tb SSD大约140毫秒,185兆(线路= 400万) (2认同)

Dam*_*ash 16

仅为了统计:

这台机器是旧戴尔的新SSD

CPU:Intel Pentium D 2,8 Ghz

SSD:Patriot Inferno 120GB SSD

4000000 'records'
175.47607421875 MB

Iteration 0
Writing raw... 3.547 seconds
Writing buffered (buffer size: 8192)... 2.625 seconds
Writing buffered (buffer size: 1048576)... 2.203 seconds
Writing buffered (buffer size: 4194304)... 2.312 seconds

Iteration 1
Writing raw... 2.922 seconds
Writing buffered (buffer size: 8192)... 2.406 seconds
Writing buffered (buffer size: 1048576)... 2.015 seconds
Writing buffered (buffer size: 4194304)... 2.282 seconds

Iteration 2
Writing raw... 2.828 seconds
Writing buffered (buffer size: 8192)... 2.109 seconds
Writing buffered (buffer size: 1048576)... 2.078 seconds
Writing buffered (buffer size: 4194304)... 2.015 seconds

Iteration 3
Writing raw... 3.187 seconds
Writing buffered (buffer size: 8192)... 2.109 seconds
Writing buffered (buffer size: 1048576)... 2.094 seconds
Writing buffered (buffer size: 4194304)... 2.031 seconds

Iteration 4
Writing raw... 3.093 seconds
Writing buffered (buffer size: 8192)... 2.141 seconds
Writing buffered (buffer size: 1048576)... 2.063 seconds
Writing buffered (buffer size: 4194304)... 2.016 seconds
Run Code Online (Sandbox Code Playgroud)

我们可以看到raw方法缓慢缓慢.

  • 但是,每当文本的大小变大时,缓冲方法就会变慢。 (2认同)

Bri*_*new 5

Java可能不会限制您的传输速度。相反,我会怀疑(无特定顺序)

  1. 从数据库传输的速度
  2. 传输到磁盘的速度

如果您读取了完整的数据集然后将其写出到磁盘,那将需要更长的时间,因为JVM必须分配内存,并且db rea / disk的写操作将顺序进行。相反,对于您从数据库进行的每次读取,我都将写出给缓冲写入器,因此该操作将接近并发操作(我不知道您是否正在执行此操作)