use*_*428 7 javascript parallel-processing node-worker-threads
I got a typescript module (used by a VSCode extension) which accepts a directory and parses the content contained within the files. For directories containing large number of files this parsing takes a bit of time therefore would like some advice on how to optimize it.
I don't want to copy/paste the entire class files therefore will be using a mock pseudocode containing the parts that I think are relevant.
class Parser {
constructor(_dir: string) {
this.dir = _dir;
}
parse() {
let tree: any = getFileTree(this.dir);
try {
let parsedObjects: MyDTO[] = await this.iterate(tree.children);
} catch (err) {
console.error(err);
}
}
async iterate(children: any[]): Promise<MyDTO[]> {
let objs: MyDTO[] = [];
for (let i = 0; i < children.length; i++) {
let child: any = children[i];
if (child.type === Constants.FILE) {
let dto: FileDTO = await this.heavyFileProcessingMethod(file); // this takes time
objs.push(dto);
} else {
// child is a folder
let dtos: MyDTO[] = await this.iterateChildItems(child.children);
let dto: FolderDTO = new FolderDTO();
dto.files = dtos.filter(item => item instanceof FileDTO);
dto.folders = dtos.filter(item => item instanceof FolderDTO);
objs.push(FolderDTO);
}
}
return objs;
}
async heavyFileProcessingMethod(file: string): Promise<FileDTO> {
let content: string = readFile(file); // util method to synchronously read file content using fs
return new FileDTO(await this.parseFileContent(content));
}
async parseFileContent(content): Promise<any[]> {
// parsing happens here and the file content is parsed into separate blocks
let ast: any = await convertToAST(content); // uses an asynchronous method of an external dependency to convert content to AST
let blocks = parseToBlocks(ast); // synchronous method called to convert AST to blocks
return await this.processBlocks(blocks);
}
async processBlocks(blocks: any[]): Promise<any[]> {
for (let i = 0; i < blocks.length; i++) {
let block: Block = blocks[i];
if (block.condition === true) {
// this can take some time because if this condition is true, some external assets will be downloaded (via internet)
// on to the caller's machine + some additional processing takes place
await processBlock(block);
}
}
return blocks;
}
}
Run Code Online (Sandbox Code Playgroud)
Still sort of a beginner to TypeScript/NodeJS. I am looking for a multithreading/Java-esque solution here if possible. In the context of Java, this.heavyFileProcessingMethod would be a instance of Callable object and this object would be pushed into a List<Callable> which would then be executed parallelly by an ExecutorService returning List<Future<Object>>.
Basically I want all files to be processed parallelly but the function must wait for all the files to be processed before returning from the method (so the entire iterate method will only take as long as the time taken to parse the largest file).
Been reading on running tasks in worker threads in NodeJS, can something like this be used in TypeScript as well? If so, can it be used in this situation? If my Parser class needs to be refactored to accommodate this change (or any other suggested change) it's no issue.
EDIT: Using Promise.all
async iterate(children: any[]): Promise<MyDTO>[] {
let promises: Promies<MyDTO>[] = [];
for(let i = 0; i <children.length; i++) {
let child: any = children[i];
if (child.type === Constants.FILE) {
let promise: Promise<FileDTO> = this.heavyFileProcessingMethod(file); // this takes time
promises.push(promise);
} else {
// child is a folder
let dtos: Promise<MyDTO>[] = this.iterateChildItems(child.children);
let promise: Promise<FolderDTO> = this.getFolderPromise(dtos);
promises.push(promise);
}
}
return promises;
}
async getFolderPromise(promises: Promise<MyDTO>[]): Promise<FolderDTO> {
return Promise.all(promises).then(dtos => {
let dto: FolderDTO = new FolderDTO();
dto.files = dtos.filter(item => item instanceof FileDTO);
dto.folders = dtos.filter(item => item instanceof FolderDTO);
return dto;
})
}
Run Code Online (Sandbox Code Playgroud)
Typescript 只是具有静态类型检查的 Javascript,当 TS 转换为 JS 时,这些静态类型将被删除。由于您的问题是关于算法和运行时语言功能,因此 Typescript 没有关系;你的问题是一个Javascript问题。所以马上就告诉我们答案
一直在阅读 NodeJS 中工作线程中运行任务的内容,类似的东西也可以在 TypeScript 中使用吗?
是是的。
至于你问题的第二部分,
这种情况可以用吗?
答案是肯定的,但是……
可以并不一定意味着你应该。这取决于您的进程是IO 密集型还是CPU 密集型。如果它们受 IO 限制,那么您很可能依赖 Javascript 长期存在的异步编程模型(回调、Promises)会更好。但如果它们受 CPU 限制,那么使用 Node 对基于线程的并行性的相对较新的支持更有可能带来吞吐量的提升。请参阅Node.js 多线程!,尽管我认为这个更好:Understanding Worker Threads in Node.js。
虽然工作线程比以前的 Node 并行选项(生成子进程)更轻,但与 Java 中的线程相比,它仍然相对较重。每个工作线程都在自己的节点虚拟机中运行,常规变量不共享(您必须使用特殊数据类型和/或消息传递来共享数据)。必须以这种方式完成,因为 Javascript 是围绕单线程编程模型设计的。它在该模型中非常高效,但该设计使得对多线程的支持变得更加困难。这是一个很好的答案,其中包含对您有用的信息:/sf/answers/4425755141/
我的猜测是您的解析更多地受 IO 限制,并且生成工作线程的开销将超过任何收益。但尝试一下,这将是一次学习经历。:)
| 归档时间: |
|
| 查看次数: |
122 次 |
| 最近记录: |