如何从Webworker中取消wasm过程

Dav*_*542 7 javascript c++ web-worker emscripten webassembly

我有一个wasm进程(从c ++编译),用于处理Web应用程序中的数据。假设必要的代码如下所示:

std::vector<JSONObject> data
for (size_t i = 0; i < data.size(); i++)
{
    process_data(data[i]);

    if (i % 1000 == 0) {
        bool is_cancelled = check_if_cancelled();
        if (is_cancelled) {
            break;
        }
    }

}
Run Code Online (Sandbox Code Playgroud)

这段代码基本上“运行/处理查询”,类似于SQL查询界面:

在此处输入图片说明

但是,查询可能需要几分钟才能运行/处理,并且在任何给定时间,用户都可以取消其查询。取消过程将在运行wasm的service Worker之外的常规javascript / web应用程序中进行。

My question then is what would be an example of how we could know that the user has clicked the 'cancel' button and communicate it to the wasm process so that knows the process has been cancelled so it can exit? Using the worker.terminate() is not an option, as we need to keep all the loaded data for that worker and cannot just kill that worker (it needs to stay alive with its stored data, so another query can be run...).

What would be an example way to communicate here between the javascript and worker/wasm/c++ application so that we can know when to exit, and how to do it properly?

Additionally, let us suppose a typical query takes 60s to run and processes 500MB of data in-browser using cpp/wasm.


Update: I think there are the following possible solutions here based on some research (and the initial answers/comments below) with some feedback on them:

  1. Use two workers, with one worker storing the data and another worker processing the data. In this way the processing-worker can be terminated, and the data will always remain. Feasible? Not really, as it would take way too much time to copy over ~ 500MB of data to the webworker whenever it starts. This could have been done (previously) using SharedArrayBuffer, but its support is now quite limited/nonexistent due to some security concerns. Too bad, as this seems like by far the best solution if it were supported...

  2. Use a single worker using Emterpreter and using emscripten_sleep_with_yield. Feasible? No, destroys performance when using Emterpreter (mentioned in the docs above), and slows down all queries by about 4-6x.

  3. Always run a second worker and in the UI just display the most recent. Feasible? No, would probably run into quite a few OOM errors if it's not a shared data structure and the data size is 500MB x 2 = 1GB (500MB seems to be a large though acceptable size when running in a modern desktop browser/computer).

  4. Use an API call to a server to store the status and check whether the query is cancelled or not. Feasible? Yes, though it seems quite heavy-handed to long-poll with network requests every second from every running query.

  5. Use an incremental-parsing approach where only a row at a time is parsed. Feasible? Yes, but also would require a tremendous amount of re-writing the parsing functions so that every function supports this (the actual data parsing is handled in several functions -- filter, search, calculate, group by, sort, etc. etc.

  6. Use IndexedDB and store the state in javascript. Allocate a chunk of memory in WASM, then return its pointer to JavaScript. Then read database there and fill the pointer. Then process your data in C++. Feasible? Not sure, though this seems like the best solution if it can be implemented.

  7. [Anything else?]

In the bounty then I was wondering three things:

  1. If the above six analyses seem generally valid?
  2. Are there other (perhaps better) approaches I'm missing?
  3. Would anyone be able to show a very basic example of doing #6 -- seems like that would be the best solution if it's possible and works cross-browser.

Tom*_*mer 5

对于 Chrome(仅),您可以使用共享内存(共享缓冲区作为内存)。当您想停下来时,请在内存中举起一面旗帜。不是这个解决方案的忠实粉丝(很复杂,仅在 chrome 中受支持)。它还取决于您的查询如何工作,以及是否有冗长查询可以检查标志的地方。

相反,您可能应该多次调用 c++ 函数(例如,对于每个查询)并检查是否应该在每次调用后暂停(只需向工作人员发送一条消息即可暂停)。

我所说的多次是分阶段进行查询(单个查询的多个函数校准)。它可能不适用于您的情况。

无论如何,AFAIK 无法向 Webassembly 执行(例如 Linux kill)发送信号。因此,您必须等待操作完成才能完成取消。

我附上了一个可以解释这个想法的代码片段。

worker.js:

... init webassembly

onmessage = function(q) {
	// query received from main thread.
	const result = ... call webassembly(q);
	postMessage(result);
}

main.js:

const worker = new Worker("worker.js");
const cancel = false;
const processing = false;

worker.onmessage(function(r) {
	// when worker has finished processing the query.
	// r is the results of the processing.
	processing = false;

	if (cancel === true) {
		// processing is done, but result is not required.
		// instead of showing the results, update that the query was canceled.
		cancel = false;
		... update UI "cancled".
		return;
	}
	
	... update UI "results r".
}

function onCancel() {
	// Occurs when user clicks on the cancel button.
	if (cancel) {
		// sanity test - prevent this in UI.
		throw "already cancelling";
	}
	
	cancel = true;
	
	... update UI "canceling". 
}

function onQuery(q) {
	if (processing === true) {
		// sanity test - prevent this in UI.
		throw "already processing";
	}
	
	processing = true;
	// Send the query to the worker.
	// When the worker receives the message it will process the query via webassembly.
	worker.postMessage(q);
}
Run Code Online (Sandbox Code Playgroud)

从用户体验的角度来看一个想法:你可以创建~两个工人。这将占用两倍的内存,但允许您“立即”“取消”一次。(这只是意味着在后端,第二个工作人员将运行下一个查询,当第一个工作人员完成取消时,取消将再次立即生效)。