将大量文档写入Firestore的最快方法是什么？

Question

将大量文档写入Firestore的最快方法是什么？

Fra*_*len 7 node.js firebase google-cloud-firestore

我需要向Firestore写大量文档。

在Node.js中最快的方法是什么？

Answer 1

Fra*_*len 12

TL; DR：在Firestore上执行批量日期创建的最快方法是执行并行的单独写入操作。

向Firestore写入1,000个文档需要：

~105.4s 使用顺序的单个写操作时
~ 2.8s 使用（2）批处理写操作时
~ 1.5s 使用并行的单个写操作时

在Firestore上执行大量写入操作的常见方式有三种。

按顺序执行每个单独的写操作。
使用批处理写操作。
并行执行单个写操作。

我们将在下面使用随机文档数据数组依次调查每个数据。

个别顺序写入操作

这是最简单的解决方案：

async function testSequentialIndividualWrites(datas) {
  while (datas.length) {
    await collection.add(datas.shift());
  }
}

Run Code Online (Sandbox Code Playgroud)

我们依次编写每个文档，直到编写完每个文档。然后，我们等待每个写操作完成，然后再开始下一个操作。

用这种方法写1,000个文档大约需要105秒，因此吞吐量大约是每秒10个文档写入。

使用批量写入操作

这是最复杂的解决方案。

async function testBatchedWrites(datas) {
  let batch = admin.firestore().batch();
  let count = 0;
  while (datas.length) {
    batch.set(collection.doc(Math.random().toString(36).substring(2, 15)), datas.shift());
    if (++count >= 500 || !datas.length) {
      await batch.commit();
      batch = admin.firestore().batch();
      count = 0;
    }
  }
}

Run Code Online (Sandbox Code Playgroud)

您可以看到我们BatchedWrite通过调用来创建一个对象batch()，填充该对象直到其最大容量为500个文档，然后将其写入Firestore。我们给每个文档一个生成的名称，该名称相对来说可能是唯一的（对于此测试而言足够好）。

用这种方法写1,000个文档大约需要2.8秒，因此吞吐量大约是每秒357个文档写入。

这比顺序进行单个写入要快得多。实际上：许多开发人员之所以使用这种方法是因为他们认为这是最快的方法，但是正如上面的结果所示，这是不正确的。由于批次的大小限制，代码是迄今为止最复杂的代码。

并行的个别写入操作

Firestore文档说明了有关添加大量数据的性能：

对于批量数据输入，请使用具有并行写操作的服务器客户端库。批处理写入的性能要好于串行写入，但不优于并行写入。

我们可以使用以下代码对此进行测试：

async function testParallelIndividualWrites(datas) {
  await Promise.all(datas.map((data) => collection.add(data)));
}

Run Code Online (Sandbox Code Playgroud)

此代码以最快的add速度启动操作，然后用于Promise.all()等待操作全部完成。使用这种方法，操作可以并行运行。

使用这种方法写入1,000个文档大约需要1.5秒，因此吞吐量约为每秒667个文档写入。

两者的区别不如前两种方法大，但仍比批量写入快1.8倍以上。

一些注意事项：

您可以在Github上找到该测试的完整代码。
在使用Node.js完成测试的同时，您可能会在Admin SDK支持的所有平台上获得相似的结果。
不过，请勿使用客户端SDK执行批量插入，因为结果可能会大不相同且难以预测。
像往常一样，实际性能取决于您的计算机，Internet连接的带宽和延迟以及许多其他因素。基于这些，尽管我希望顺序保持不变，但您可能也会看到差异。
如果您自己的测试中有异常值，或发现完全不同的结果，请在下面留下评论。
批量写入是原子的。因此，如果您在文档之间具有依赖关系，并且必须编写所有文档，或者都不写任何文档，则应使用批处理写入。

@robsiemb 我也刚刚测试了并行批量写入。性能与单独的并行写入非常相似，所以我想说它们在我的测试中并列第一。我确实预计，由于后端处理的性质，批量写入可能会恶化得更快。结合更复杂的代码，我仍然建议仅使用它们的原子性，而不是感知但不存在的性能优势。 (3认同)
请注意，如果任何元素被拒绝，则“Promise.all”将被拒绝。您可以使用“Promise.allSettled”代替。 (3认同)
这非常有趣，感谢您所做的工作！OOC，你测试过并行运行批量写入吗？显然，在这种情况下，您需要更加确保避免任何文档同时出现在两个批次中。 (2认同)
调用“add()”只不过是生成一个唯一的 ID（纯客户端），然后执行“set()”操作。所以结果应该是一样的。如果这不是您所观察到的，请发布一个新问题，并用最小的案例来重现您所尝试的内容。 (2认同)

Answer 2

D G*_*D G 6

正如对 OP 的评论中所述，在云功能内向 Firestore 写入文档时，我有相反的经历。

TL;DR：将 1200 个文档写入 Firestore 时，并行单独写入比并行批量写入慢 5 倍以上。

我能想到的唯一解释是 Google 云功能和 Firestore 之间发生了某种瓶颈或请求速率限制。这有点神秘。

这是我进行基准测试的两种方法的代码：

const functions = require('firebase-functions');
const admin = require('firebase-admin');


admin.initializeApp();
const db = admin.firestore();


// Parallel Batch Writes
exports.cloneAppBatch = functions.https.onCall((data, context) => {

    return new Promise((resolve, reject) => {

        let fromAppKey = data.appKey;
        let toAppKey = db.collection('/app').doc().id;


        // Clone/copy data from one app subcollection to another
        let startTimeMs = Date.now();
        let docs = 0;

        // Write the app document (and ensure cold start doesn't affect timings below)
        db.collection('/app').doc(toAppKey).set({ desc: 'New App' }).then(() => {

            // Log Benchmark
            functions.logger.info(`[BATCH] 'Write App Config Doc' took ${Date.now() - startTimeMs}ms`);


            // Get all documents in app subcollection
            startTimeMs = Date.now();

            return db.collection(`/app/${fromAppKey}/data`).get();

        }).then(appDataQS => {

            // Log Benchmark
            functions.logger.info(`[BATCH] 'Read App Data' took ${Date.now() - startTimeMs}ms`);


            // Batch up documents and write to new app subcollection
            startTimeMs = Date.now();

            let commits = [];
            let bDocCtr = 0;
            let batch = db.batch();

            appDataQS.forEach(docSnap => {

                let doc = docSnap.data();
                let docKey = docSnap.id;
                docs++;

                let docRef = db.collection(`/app/${toAppKey}/data`).doc(docKey);

                batch.set(docRef, doc);
                bDocCtr++

                if (bDocCtr >= 500) {
                    commits.push(batch.commit());
                    batch = db.batch();
                    bDocCtr = 0;
                }

            });

            if (bDocCtr > 0) commits.push(batch.commit());

            Promise.all(commits).then(results => {
                // Log Benchmark
                functions.logger.info(`[BATCH] 'Write App Data - ${docs} docs / ${commits.length} batches' took ${Date.now() - startTimeMs}ms`);
                resolve(results);
            });
         
        }).catch(err => {
            reject(err);
        });

    });

});


// Parallel Individual Writes
exports.cloneAppNoBatch = functions.https.onCall((data, context) => {

    return new Promise((resolve, reject) => {

        let fromAppKey = data.appKey;
        let toAppKey = db.collection('/app').doc().id;


        // Clone/copy data from one app subcollection to another
        let startTimeMs = Date.now();
        let docs = 0;

        // Write the app document (and ensure cold start doesn't affect timings below)
        db.collection('/app').doc(toAppKey).set({ desc: 'New App' }).then(() => {

            // Log Benchmark
            functions.logger.info(`[INDIVIDUAL] 'Write App Config Doc' took ${Date.now() - startTimeMs}ms`);


            // Get all documents in app subcollection
            startTimeMs = Date.now();

            return db.collection(`/app/${fromAppKey}/data`).get();

        }).then(appDataQS => {

            // Log Benchmark
            functions.logger.info(`[INDIVIDUAL] 'Read App Data' took ${Date.now() - startTimeMs}ms`);


            // Gather up documents and write to new app subcollection
            startTimeMs = Date.now();

            let commits = [];

            appDataQS.forEach(docSnap => {

                let doc = docSnap.data();
                let docKey = docSnap.id;
                docs++;
                    
                // Parallel individual writes
                commits.push(db.collection(`/app/${toAppKey}/data`).doc(docKey).set(doc));
        
            });

            Promise.all(commits).then(results => {
                // Log Benchmark
                functions.logger.info(`[INDIVIDUAL] 'Write App Data - ${docs} docs' took ${Date.now() - startTimeMs}ms`);
                resolve(results);
            });
         
        }).catch(err => {
            reject(err);
        });

    });

});

Run Code Online (Sandbox Code Playgroud)

具体结果为（每次运行 3 次的平均值）：

批量写入：

读取 1200 个文档 - 2.4 秒/写入 1200 个文档 - 1.8 秒

个人写：

读取 1200 个文档 - 2.4 秒/写入 1200 个文档 - 10.5 秒

注意：这些结果比我前几天得到的结果要好得多——也许谷歌今天过得很糟糕——但是批量写入和单独写入之间的相对性能保持不变。很高兴看看其他人是否有类似的经历。

归档时间：	6 年，6 月前
查看次数：	301 次
最近记录：	6 年，6 月前