使用多个线程写入文件

dan*_*onc 3 java concurrency file-io multithreading java-io

我需要在VM中写很多文件.我需要写300.000个文件,今天工作正常生成文件,但使用的时间是3~4小时完成工作.

如何实现这个并行线程?

Pet*_*rey 6

我已经找到了一种可以从多线程中受益的方法,但只需对代码进行最少的更改.

import java.io.*;
import java.util.concurrent.Executors;
import java.util.concurrent.ThreadPoolExecutor;
import java.util.concurrent.TimeUnit;

/**
 * Created by peter.lawrey on 30/01/15.
 */
public class ConcurrentFileWriter {
    private final ThreadPoolExecutor es;
    private final int maxQueueSize;

    public ConcurrentFileWriter() {
        this(4, 10000);
    }

    public ConcurrentFileWriter(int concurrency, int maxQueueSize) {
        this.maxQueueSize = maxQueueSize;
        es = (ThreadPoolExecutor) Executors.newFixedThreadPool(concurrency);
    }

    public OutputStream newFileOutputStream(final String filename) {
        return new ByteArrayOutputStream() {
            @Override
            public void close() throws IOException {
                super.close();
                final ByteArrayOutputStream baos = this;
                if (es.getQueue().size() > maxQueueSize)
                    try {
                        Thread.sleep(10);
                    } catch (InterruptedException e) {
                        throw new AssertionError(e);
                    }
                es.submit(new Runnable() {
                    public void run() {
                        try {
                            FileOutputStream fos = new FileOutputStream(filename);
                            fos.write(baos.toByteArray());
                            fos.close();
                        } catch (IOException ioe) {
                            System.err.println("Unable to write to " + filename);
                            ioe.printStackTrace();
                        }
                    }
                });
            }
        };
    }

    public PrintWriter newPrintWriter(String filename) {
        try {
            return new PrintWriter(new OutputStreamWriter(newFileOutputStream(filename), "UTF-8"));
        } catch (UnsupportedEncodingException e) {
            throw new AssertionError(e);
        }
    }

    public void close() {
        es.shutdown();
        try {
            es.awaitTermination(2, TimeUnit.HOURS);
        } catch (InterruptedException e) {
            e.printStackTrace();
            Thread.currentThread().interrupt();
        }
    }

    public static void main(String... args) {
        long start = System.nanoTime();
        ConcurrentFileWriter cfw = new ConcurrentFileWriter();
        int files = 10000;
        for (int i = 0; i < files; i++) {
            PrintWriter pw = cfw.newPrintWriter("file-" + i);
            pw.println("Hello World");
            pw.close();
        }
        long mid = System.nanoTime();
        System.out.println("Waiting for files to be written");
        cfw.close();
        long end = System.nanoTime();
        System.out.printf("Took %.3f seconds to generate %,d files and %.3f seconds to write them to disk%n",
                (mid - start) / 1e9, files, (end - mid) / 1e9);
    }
}
Run Code Online (Sandbox Code Playgroud)

在SSD上,打印

Waiting for files to be written
Took 0.075 seconds to generate 10,000 files and 0.058 seconds to write them to disk
Run Code Online (Sandbox Code Playgroud)

这样做可以让你像现在一样编写单线程代码,但实际写入磁盘是作为后台任务完成的.

注意:您必须调用close()以等待文件实际写入磁盘.


编写大量文件的问题是这对于HDD来说是很多工作.使用多个线程不会让您的驱动器旋转任何紧固件.每次打开和关闭文件时,它都使用大约2 IO(IO操作)如果您有HDD并且它支持80 IOPS(每秒IO),您可以每秒打开和关闭40个文件.即300,000个文件大约需要2个小时.

相比之下,如果您使用SSD,则可以获得80,000 IOPS,速度提高1000倍,您可能只需要8秒打开和关闭文件.

切换到SSD后,可能需要使用多个线程.一种简单的方法是在Java 8中使用Stream API.

你可以做这样的事情

IntStream.range(0, 300000).parallel().
         .forEach(i -> createFile(i));
Run Code Online (Sandbox Code Playgroud)