异步正则表达式是否存在于C#中,它们会帮助我的情况吗?

Jos*_*nox 4 c# regex async-await

我的应用程序使用正则表达式并行搜索多个文件, await Task.WhenAll(filePaths.Select(FindThings));

在内部FindThings,它花费大部分时间执行正则表达式搜索,因为这些文件的大小可能达到数百mb.

static async Task FindThings(string path) {
    string fileContent = null;
    try
    {
        using (var reader = File.OpenText(path))
            fileContent = await reader.ReadToEndAsync();
    }
    catch (Exception e)
    {
        WriteLine(lineIndex, "{0}: Error {1}", filename, e);
        return;
    }

    var exitMatches = _exitExp.Matches(fileContent);

    foreach (Match exit in exitMatches)
    {
        if (_taskDelay > 0)
            await Task.Delay(_taskDelay);

    // [...]
Run Code Online (Sandbox Code Playgroud)
  • 是否有异步版本的正则表达式或任何方式使这与任务正确合作?

为什么这很重要

我得到了很多回复,表明我没有说明为什么这很重要.以此示例程序(使用Nitro.Async库):

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using Nito.AsyncEx;

namespace Scrap
{
    class Program
    {
        static void Main(string[] args)
        {
            AsyncContext.Run(() => MainAsync(args));
        }

        static async void MainAsync(string[] args)
        {
            var tasks = new List<Task>();

            var asyncStart = DateTime.Now;
            tasks.Add(Task.WhenAll(Enumerable.Range(0, 10).Select(i =>
                ShowIndexAsync(i, asyncStart))));

            var start = DateTime.Now;
            tasks.Add(Task.WhenAll(Enumerable.Range(0, 10).Select(i =>
                ShowIndex(i, start))));

            await Task.WhenAll(tasks);

            Console.ReadLine();
        }


        static async Task ShowIndexAsync(int index, DateTime start)
        {
            Console.WriteLine("ShowIndexAsync: {0} ({1})",
                index, DateTime.Now - start);
            await Task.Delay(index * 100);
            Console.WriteLine("!ShowIndexAsync: {0} ({1})",
                index, DateTime.Now - start);
        }

        static Task ShowIndex(int index, DateTime start)
        {
            return Task.Factory.StartNew(() => {
                Console.WriteLine("ShowIndex: {0} ({1})",
                    index, DateTime.Now - start);
                Task.Delay(index * 100).Wait();
                Console.WriteLine("!ShowIndex: {0} ({1})",
                    index, DateTime.Now - start);
            });
        }
    }
}
Run Code Online (Sandbox Code Playgroud)

因此,这会将ShowIndexAsync调用10次,然后调用ShowIndex 10次,并等待它们完成.ShowIndexAsync是"与核心异步"而ShowIndex不是,但它们都在任务上运行.这里的阻塞操作是Task.Delay,差异是一个等待该任务,而另一个.Wait()是在任务内部.

你希望第一个排队(ShowIndexAsync)先完成,但你不正确.

ShowIndexAsync: 0 (00:00:00.0060000)
!ShowIndexAsync: 0 (00:00:00.0070000)
ShowIndexAsync: 1 (00:00:00.0080000)
ShowIndexAsync: 2 (00:00:00.0110000)
ShowIndexAsync: 3 (00:00:00.0110000)
ShowIndexAsync: 4 (00:00:00.0120000)
ShowIndexAsync: 5 (00:00:00.0130000)
ShowIndexAsync: 6 (00:00:00.0130000)
ShowIndexAsync: 7 (00:00:00.0140000)
ShowIndexAsync: 8 (00:00:00.0150000)
ShowIndexAsync: 9 (00:00:00.0150000)
ShowIndex: 0 (00:00:00.0020000)
!ShowIndex: 0 (00:00:00.0020000)
ShowIndex: 1 (00:00:00.0030000)
!ShowIndex: 1 (00:00:00.1100000)
ShowIndex: 2 (00:00:00.1100000)
!ShowIndex: 2 (00:00:00.3200000)
ShowIndex: 3 (00:00:00.3200000)
!ShowIndex: 3 (00:00:00.6220000)
ShowIndex: 4 (00:00:00.6220000)
!ShowIndex: 4 (00:00:01.0280000)
ShowIndex: 5 (00:00:01.0280000)
!ShowIndex: 5 (00:00:01.5420000)
ShowIndex: 6 (00:00:01.5420000)
!ShowIndex: 6 (00:00:02.1500000)
ShowIndex: 7 (00:00:02.1510000)
!ShowIndex: 7 (00:00:02.8650000)
ShowIndex: 8 (00:00:02.8650000)
!ShowIndex: 8 (00:00:03.6660000)
ShowIndex: 9 (00:00:03.6660000)
!ShowIndex: 9 (00:00:04.5780000)
!ShowIndexAsync: 1 (00:00:04.5950000)
!ShowIndexAsync: 2 (00:00:04.5960000)
!ShowIndexAsync: 3 (00:00:04.5970000)
!ShowIndexAsync: 4 (00:00:04.5970000)
!ShowIndexAsync: 5 (00:00:04.5980000)
!ShowIndexAsync: 6 (00:00:04.5990000)
!ShowIndexAsync: 7 (00:00:04.5990000)
!ShowIndexAsync: 8 (00:00:04.6000000)
!ShowIndexAsync: 9 (00:00:04.6010000)
Run Code Online (Sandbox Code Playgroud)

为什么会这样?

任务调度程序只会使用这么多真正的线程."await"编译成一个合作的多任务状态机.如果你有一个无法等待的阻塞操作,在这个例子中Task.Delay(...).Wait(),但在我的问题中,正则表达式匹配,它不会合作,让任务调度程序正确地管理任务.

如果我们将示例程序更改为:

    static async void MainAsync(string[] args)
    {
        var asyncStart = DateTime.Now;
        await Task.WhenAll(Enumerable.Range(0, 10).Select(i =>
            ShowIndexAsync(i, asyncStart)));

        var start = DateTime.Now;
        await Task.WhenAll(Enumerable.Range(0, 10).Select(i =>
            ShowIndex(i, start)));

        Console.ReadLine();
    }
Run Code Online (Sandbox Code Playgroud)

然后我们的输出变为:

ShowIndexAsync: 0 (00:00:00.0050000)
!ShowIndexAsync: 0 (00:00:00.0050000)
ShowIndexAsync: 1 (00:00:00.0060000)
ShowIndexAsync: 2 (00:00:00.0080000)
ShowIndexAsync: 3 (00:00:00.0090000)
ShowIndexAsync: 4 (00:00:00.0090000)
ShowIndexAsync: 5 (00:00:00.0100000)
ShowIndexAsync: 6 (00:00:00.0110000)
ShowIndexAsync: 7 (00:00:00.0110000)
ShowIndexAsync: 8 (00:00:00.0120000)
ShowIndexAsync: 9 (00:00:00.0120000)
!ShowIndexAsync: 1 (00:00:00.1150000)
!ShowIndexAsync: 2 (00:00:00.2180000)
!ShowIndexAsync: 3 (00:00:00.3160000)
!ShowIndexAsync: 4 (00:00:00.4140000)
!ShowIndexAsync: 5 (00:00:00.5190000)
!ShowIndexAsync: 6 (00:00:00.6130000)
!ShowIndexAsync: 7 (00:00:00.7190000)
!ShowIndexAsync: 8 (00:00:00.8170000)
!ShowIndexAsync: 9 (00:00:00.9170000)
ShowIndex: 0 (00:00:00.0030000)
!ShowIndex: 0 (00:00:00.0040000)
ShowIndex: 3 (00:00:00.0060000)
ShowIndex: 4 (00:00:00.0090000)
ShowIndex: 2 (00:00:00.0100000)
ShowIndex: 1 (00:00:00.0100000)
ShowIndex: 5 (00:00:00.0130000)
ShowIndex: 6 (00:00:00.0130000)
ShowIndex: 7 (00:00:00.0150000)
ShowIndex: 8 (00:00:00.0180000)
!ShowIndex: 7 (00:00:00.7660000)
!ShowIndex: 6 (00:00:00.7660000)
ShowIndex: 9 (00:00:00.7660000)
!ShowIndex: 2 (00:00:00.7660000)
!ShowIndex: 5 (00:00:00.7660000)
!ShowIndex: 4 (00:00:00.7660000)
!ShowIndex: 3 (00:00:00.7660000)
!ShowIndex: 1 (00:00:00.7660000)
!ShowIndex: 8 (00:00:00.8210000)
!ShowIndex: 9 (00:00:01.6700000)
Run Code Online (Sandbox Code Playgroud)

请注意异步调用如何具有良好的均匀结束时间分布,但非异步代码不具有.任务调度程序被阻止,因为它不会创建额外的真实线程,因为它期望合作.

我不希望它占用更少的CPU时间等,但我的目标是FindThings在合作庄园中进行多任务,即使其"与核心异步".

Ste*_*ary 9

正则表达式搜索是一个CPU绑定操作,所以他们需要时间.您可以使用Task.Run将工作推送到后台线程,从而保持您的UI响应,但它不会帮助他们更快.

由于您的搜索已经并行,因此您可以做更多的事情.您可以尝试使用异步文件读取来减少线程池中阻塞线程的数量,但它可能不会产生巨大影响.

您当前的代码调用ReadToEndAsync,但它需要打开异步访问(即使用该文件FileStream的构造,并通过将明确要求异步文件句柄trueisAsync参数或FileOptions.Asynchronousoptions参数).