如何并行处理项目然后合并结果?

Mat*_*olf 5 c# concurrency merge task-parallel-library tpl-dataflow

我面临以下问题:

我有一个Foo对象数据流,并将这些对象流式传输到几个并发的进程内任务/线程,这些任务/线程依次处理对象和输出FooResult对象。每个都FooResult包含在其他成员中与Foo在创建FooResult. 但是,并非每个人都Foo必须创建一个FooResult.

我的问题是,我想从整个过程传递一个包装对象,该对象包含原始对象Foo和可能FooResultFoo并发任务中创建的所有对象(如果有)。

注意:我目前使用 TPL 数据流,而每个并发进程都发生在ActionBlock<Foo>BroadCastBlock<Foo>. 它使用SendAsync()目标数据流块来发送可能创建的FooResult. 显然,并发数据流块FooResult在不可预测的时间产生,这正是我目前所面临的问题。我似乎无法弄清楚FooResult总共创建了多少个,ActionBlock<Foo>以便我可以将它们与原始Foo对象捆绑在一起并将其作为包装对象传递。

在伪代码中,它目前如下所示:

BroadCastBlock<Foo> broadCastBlock;
ActionBlock<Foo> aBlock1;
ActionBlock<Foo> aBlock2; 
ActionBlock<FooResult> targetBlock;
broadCastBlock.LinkTo(aBlock1); broadCastBlock.LinkTo(aBlock2);

aBlock1 = new ActionBlock<Foo>(foo =>
{
    //do something here. Sometimes create a FooResult. If then
    targetBlock.SendAsync(fooResult);
});

//similar for aBlock2
Run Code Online (Sandbox Code Playgroud)

但是,当前代码的问题在于,如果在任何操作块中都Foo没有生成单个,则 targetBlock 可能不会收到任何内容FooResult。此外,targetBlock 可能接收 2 个FooResult对象,因为每个操作块都生成了一个FooResult.

我想要的是 targetBlock 接收一个包含每个对象的包装对象,Foo如果FooResult创建了对象,那么还有一个FooResult.

有什么想法可以使解决方案按照所描述的方式工作吗?它不必仔细阅读 TPL 数据流,但如果这样做了,它会很整洁。

更新:以下是我按照 svick 的建议通过实现 JoinBlock 得到的。我不会使用它(除非它可以在性能上进行调整),因为它运行速度非常慢,我每秒可以处理大约 89000 个项目(而且这只是 int 值类型)。

public class Test
{
    private BroadcastBlock<int> broadCastBlock;
    private TransformBlock<int, int> transformBlock1;
    private TransformBlock<int, int> transformBlock2;
    private JoinBlock<int, int, int> joinBlock;
    private ActionBlock<Tuple<int, int, int>> processorBlock;

    public Test()
    {
        broadCastBlock = new BroadcastBlock<int>(i =>
            {
                return i;
            });

        transformBlock1 = new TransformBlock<int, int>(i =>
            {
                return i;
            }, new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = DataflowBlockOptions.Unbounded });

        transformBlock2 = new TransformBlock<int, int>(i =>
            {
                return i;
            }, new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = DataflowBlockOptions.Unbounded });

        joinBlock = new JoinBlock<int, int, int>();

        processorBlock = new ActionBlock<Tuple<int, int, int>>(tuple =>
            {
                //Console.WriteLine("original value: " + tuple.Item1 + "tfb1: " + tuple.Item2 + "tfb2: " + tuple.Item3);
            }, new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = DataflowBlockOptions.Unbounded });

        //Linking
        broadCastBlock.LinkTo(transformBlock1, new DataflowLinkOptions { PropagateCompletion = true });
        broadCastBlock.LinkTo(transformBlock2, new DataflowLinkOptions { PropagateCompletion = true });

        broadCastBlock.LinkTo(joinBlock.Target1, new DataflowLinkOptions { PropagateCompletion = true });
        transformBlock1.LinkTo(joinBlock.Target2, new DataflowLinkOptions { PropagateCompletion = true });
        transformBlock2.LinkTo(joinBlock.Target3, new DataflowLinkOptions { PropagateCompletion = true });

        joinBlock.LinkTo(processorBlock, new DataflowLinkOptions { PropagateCompletion = true });
    }

    public void Start()
    {
        Stopwatch watch = new Stopwatch();
        watch.Start();

        const int numElements = 1000000;

        for (int i = 1; i <= numElements; i++)
        {
            broadCastBlock.Post(i);
        }

        ////mark completion
        broadCastBlock.Complete();

        processorBlock.Completion.Wait();

        watch.Stop();

        Console.WriteLine("Time it took: " + watch.ElapsedMilliseconds + " - items processed per second: " + numElements / watch.ElapsedMilliseconds * 1000);
        Console.ReadLine();
    }
}
Run Code Online (Sandbox Code Playgroud)

更新代码以反映建议:

public Test()
    {
        broadCastBlock = new BroadcastBlock<int>(i =>
            {
                return i;
            });

        transformBlock1 = new TransformBlock<int, int>(i =>
            {
                return i;
            });

        transformBlock2 = new TransformBlock<int, int>(i =>
            {
                return i;
            });

        joinBlock = new JoinBlock<int, int>();

        processorBlock = new ActionBlock<Tuple<int, int>>(tuple =>
            {
                //Console.WriteLine("tfb1: " + tuple.Item1 + "tfb2: " + tuple.Item2);
            });

        //Linking
        broadCastBlock.LinkTo(transformBlock1, new DataflowLinkOptions { PropagateCompletion = true });
        broadCastBlock.LinkTo(transformBlock2, new DataflowLinkOptions { PropagateCompletion = true });
        transformBlock1.LinkTo(joinBlock.Target1);
        transformBlock2.LinkTo(joinBlock.Target2);
        joinBlock.LinkTo(processorBlock, new DataflowLinkOptions { PropagateCompletion = true });
    }

    public void Start()
    {
        Stopwatch watch = new Stopwatch();
        watch.Start();

        const int numElements = 1000000;

        for (int i = 1; i <= numElements; i++)
        {
            broadCastBlock.Post(i);
        }

        ////mark completion
        broadCastBlock.Complete();
        Task.WhenAll(transformBlock1.Completion, transformBlock2.Completion).ContinueWith(_ => joinBlock.Complete());


        processorBlock.Completion.Wait();

        watch.Stop();

        Console.WriteLine("Time it took: " + watch.ElapsedMilliseconds + " - items processed per second: " + numElements / watch.ElapsedMilliseconds * 1000);
        Console.ReadLine();
    }
}
Run Code Online (Sandbox Code Playgroud)

svi*_*ick 3

我可以看到两种方法来解决这个问题:

  1. 使用JoinBlock。您的广播块和两个工作块将分别发送到连接块的一个目标。如果工作块没有任何结果,它将给出它null(或一些其他特殊值)。您的工作块将需要更改为TranformBlock<Foo, FooResult>,因为使用ActionBlock您所做的方式并不能保证排序(至少在您设置时不能保证MaxDegreeOfParallelismTransformBlock

    的结果JoinBlock将是 a Tuple<Foo, FooResult, FooResult>,其中任何一个或两个FooResults 都可以是null

    尽管我不确定我是否喜欢这个解决方案,但它在很大程度上依赖于项目的正确排序,但这对我来说似乎很脆弱。

  2. 使用一些其他对象进行同步。当所有块都完成某个项目时,该对象将负责向前发送结果。NotificationWrapper这与马里奥在回答中的建议类似。

    在这种情况下,您可以使用TaskCompletionSourceTask.WhenAll()来处理同步。