Nodejs:如何优化写入多个文件?

ACP*_*ice 2 optimization file-io writefile node.js kinect

我在 Windows 上的 Node 环境中工作。我的代码Buffer每秒接收 30 个对象(每个约 500-900kb),我需要尽快将这些数据保存到文件系统中,而不进行任何阻止以下接收的工作Buffer(即目标是保存每个缓冲区中的数据,大约 30-45 分钟)。就其价值而言,数据是来自 Kinect 传感器的连续深度帧。

我的问题是:在 Node 中写入文件的最佳方式是什么?

这是伪代码:

let num = 0

async function writeFile(filename, data) {
  fs.writeFileSync(filename, data)
}

// This fires 30 times/sec and runs for 30-45 min
dataSender.on('gotData', function(data){

  let filename = 'file-' + num++

  // Do anything with data here to optimize write?
  writeFile(filename, data)
}
Run Code Online (Sandbox Code Playgroud)

fs.writeFileSync似乎比 快得多fs.writeFile,这就是我在上面使用它的原因。但是有没有其他方法可以对数据进行操作或写入文件以加快每次保存的速度?

jfr*_*d00 5

First off, you never want to use fs.writefileSync() in handling real-time requests because that blocks the entire node.js event loop until the file write is done.

OK, based on writing each block of data to a different file, then you want to allow multiple disk writes to be in process at the same time, but not unlimited disk writes. So, it's still appropriate to use a queue, but this time the queue doesn't just have one write in process at a time, it has some number of writes in process at the same time:

const EventEmitter = require('events');

class Queue extends EventEmitter {
    constructor(basePath, baseIndex, concurrent = 5) {
        this.q = [];
        this.paused = false;
        this.inFlightCntr = 0;
        this.fileCntr = baseIndex;
        this.maxConcurrent = concurrent;
    }

    // add item to the queue and write (if not already writing)
    add(data) {
        this.q.push(data);
        write();
    }

    // write next block from the queue (if not already writing)
    write() {
        while (!paused && this.q.length && this.inFlightCntr < this.maxConcurrent) {
            this.inFlightCntr++;
            let buf = this.q.shift();
            try {
                fs.writeFile(basePath + this.fileCntr++, buf, err => {
                    this.inFlightCntr--;
                    if (err) {
                        this.err(err);
                    } else {
                        // write more data
                        this.write();
                    }
                });
            } catch(e) {
                this.err(e);
            }
        }
    }

    err(e) {
        this.pause();
        this.emit('error', e)
    }

    pause() {
        this.paused = true;
    }

    resume() {
        this.paused = false;
        this.write();
    }
}

let q = new Queue("file-", 0, 5);

// This fires 30 times/sec and runs for 30-45 min
dataSender.on('gotData', function(data){
    q.add(data);
}

q.on('error', function(e) {
    // go some sort of write error here
    console.log(e);
});
Run Code Online (Sandbox Code Playgroud)

Things to consider:

  1. Experiment with the concurrent value you pass to the Queue constructor. Start with a value of 5. Then see if raising that value any higher gives you better or worse performance. The node.js file I/O subsystem uses a thread pool to implement asynchronous disk writes so there is a max number of concurrent writes that will allow so cranking the concurrent number up really high probably does not make things go faster.

  2. You can experiement with increasing the size of the disk I/O thread pool by setting the UV_THREADPOOL_SIZE environment variable before you start your node.js app.

  3. Your biggest friend here is disk write speed. So, make sure you have a fast disk with a good disk controller. A fast SSD on a fast bus would be best.

  4. If you can spread the writes out across multiple actual physical disks, that will likely also increase write throughput (more disk heads at work).


This is a prior answer based on the initial interpretation of the question (before editing that changed it).

Since it appears you need to do your disk writes in order (all to the same file), then I'd suggest that you either use a write stream and let the stream object serialize and cache the data for you or you can create a queue yourself like this:

const EventEmitter = require('events');

class Queue extends EventEmitter {
    // takes an already opened file handle
    constructor(fileHandle) {
        this.f = fileHandle;
        this.q = [];
        this.nowWriting = false;
        this.paused = false;
    }

    // add item to the queue and write (if not already writing)
    add(data) {
        this.q.push(data);
        write();
    }

    // write next block from the queue (if not already writing)
    write() {
        if (!nowWriting && !paused && this.q.length) {
            this.nowWriting = true;
            let buf = this.q.shift();
            fs.write(this.f, buf, (err, bytesWritten) => {
                this.nowWriting = false;
                if (err) {
                    this.pause();
                    this.emit('error', err);
                } else {
                    // write next block
                    this.write();
                }
            });
        }
    }

    pause() {
        this.paused = true;
    }

    resume() {
        this.paused = false;
        this.write();
    }
}

// pass an already opened file handle
let q = new Queue(fileHandle);

// This fires 30 times/sec and runs for 30-45 min
dataSender.on('gotData', function(data){
    q.add(data);
}

q.on('error', function(err) {
    // got disk write error here
});
Run Code Online (Sandbox Code Playgroud)

You could use a writeStream instead of this custom Queue class, but the problem with that is that the writeStream may fill up and then you'd have to have a separate buffer as a place to put the data anyway. Using your own custom queue like above takes care of both issues at once.

Other Scalability/Performance Comments

  1. Because you appear to be writing the data serially to the same file, your disk writing won't benefit from clustering or running multiple operations in parallel because they basically have to be serialized.

  2. If your node.js server has other things to do besides just doing these writes, there might be a slight advantage (would have to be verified with testing) to creating a second node.js process and doing all the disk writing in that other process. Your main node.js process would receive the data and then pass it to the child process that would maintain the queue and do the writing.

  3. Another thing you could experiment with is coalescing writes. When you have more than one item in the queue, you could combine them together into a single write. If the writes are already sizable, this probably doesn't make much difference, but if the writes were small this could make a big difference (combining lots of small disk writes into one larger write is usually more efficient).

  4. Your biggest friend here is disk write speed. So, make sure you have a fast disk with a good disk controller. A fast SSD would be best.