我经常发现自己在读取一个大的 JSON 文件(通常是一个对象数组),然后操作每个对象并写回一个新文件。
为了在 Node 中实现这一点(至少是读取数据部分),我通常使用 stream-json 模块来做这样的事情。
const fs = require('fs');
const StreamArray = require('stream-json/streamers/StreamArray');
const pipeline = fs.createReadStream('sample.json')
.pipe(StreamArray.withParser());
pipeline.on('data', data => {
//do something with each object in file
});
Run Code Online (Sandbox Code Playgroud)
我最近发现了 Deno,并且希望能够使用 Deno 完成这个工作流程。
看起来标准库中的readJSON方法将文件的全部内容读入内存,所以我不知道它是否适合处理大文件。
有没有一种方法可以通过使用 Deno 中内置的一些较低级别的方法从文件中流式传输数据来完成?
现在 Deno 1.0 已经发布了,以防万一其他人有兴趣做这样的事情。我能够拼凑出一个适合我的用例的小类。它并不像类似的东西那么强大stream-json但它可以很好地处理大型 JSON 数组。
import { EventEmitter } from "https://deno.land/std/node/events.ts";\n\nexport class JSONStream extends EventEmitter {\n\n private openBraceCount = 0;\n private tempUint8Array: number[] = [];\n private decoder = new TextDecoder();\n\n constructor (private filepath: string) {\n super();\n this.stream();\n }\n\n async stream() {\n console.time("Run Time");\n let file = await Deno.open(this.filepath);\n //creates iterator from reader, default buffer size is 32kb\n for await (const buffer of Deno.iter(file)) {\n\n for (let i = 0, len = buffer.length; i < len; i++) {\n const uint8 = buffer[ i ];\n\n //remove whitespace\n if (uint8 === 10 || uint8 === 13 || uint8 === 32) continue;\n\n //open brace\n if (uint8 === 123) {\n if (this.openBraceCount === 0) this.tempUint8Array = [];\n this.openBraceCount++;\n };\n\n this.tempUint8Array.push(uint8);\n\n //close brace\n if (uint8 === 125) {\n this.openBraceCount--;\n if (this.openBraceCount === 0) {\n const uint8Ary = new Uint8Array(this.tempUint8Array);\n const jsonString = this.decoder.decode(uint8Ary);\n const object = JSON.parse(jsonString);\n this.emit(\'object\', object);\n }\n };\n };\n }\n file.close();\n console.timeEnd("Run Time");\n }\n}\nRun Code Online (Sandbox Code Playgroud)\n\n用法示例
\n\nconst stream = new JSONStream(\'test.json\');\n\nstream.on(\'object\', (object: any) => {\n // do something with each object\n});\nRun Code Online (Sandbox Code Playgroud)\n\n处理约 4.8 MB 的 json 文件,其中包含约 20,000 个小对象
\n\n[\n {\n "id": 1,\n "title": "in voluptate sit officia non nesciunt quis",\n "urls": {\n "main": "https://www.placeholder.com/600/1b9d08",\n "thumbnail": "https://www.placeholder.com/150/1b9d08"\n }\n },\n {\n "id": 2,\n "title": "error quasi sunt cupiditate voluptate ea odit beatae",\n "urls": {\n "main": "https://www.placeholder.com/600/1b9d08",\n "thumbnail": "https://www.placeholder.com/150/1b9d08"\n }\n }\n ...\n]\nRun Code Online (Sandbox Code Playgroud)\n\n花了 127 毫秒。
\n\n[\n {\n "id": 1,\n "title": "in voluptate sit officia non nesciunt quis",\n "urls": {\n "main": "https://www.placeholder.com/600/1b9d08",\n "thumbnail": "https://www.placeholder.com/150/1b9d08"\n }\n },\n {\n "id": 2,\n "title": "error quasi sunt cupiditate voluptate ea odit beatae",\n "urls": {\n "main": "https://www.placeholder.com/600/1b9d08",\n "thumbnail": "https://www.placeholder.com/150/1b9d08"\n }\n }\n ...\n]\nRun Code Online (Sandbox Code Playgroud)\n