Mat*_*olf 5 c# concurrency merge task-parallel-library tpl-dataflow
我面临以下问题:
我有一个Foo对象数据流,并将这些对象流式传输到几个并发的进程内任务/线程,这些任务/线程依次处理对象和输出FooResult对象。每个都FooResult包含在其他成员中与Foo在创建FooResult. 但是,并非每个人都Foo必须创建一个FooResult.
我的问题是,我想从整个过程传递一个包装对象,该对象包含原始对象Foo和可能FooResult从Foo并发任务中创建的所有对象(如果有)。
注意:我目前使用 TPL 数据流,而每个并发进程都发生在ActionBlock<Foo>从BroadCastBlock<Foo>. 它使用SendAsync()目标数据流块来发送可能创建的FooResult. 显然,并发数据流块FooResult在不可预测的时间产生,这正是我目前所面临的问题。我似乎无法弄清楚FooResult总共创建了多少个,ActionBlock<Foo>以便我可以将它们与原始Foo对象捆绑在一起并将其作为包装对象传递。
在伪代码中,它目前如下所示:
BroadCastBlock<Foo> broadCastBlock;
ActionBlock<Foo> aBlock1;
ActionBlock<Foo> aBlock2;
ActionBlock<FooResult> targetBlock;
broadCastBlock.LinkTo(aBlock1); broadCastBlock.LinkTo(aBlock2);
aBlock1 = new ActionBlock<Foo>(foo =>
{
//do something here. Sometimes create a FooResult. If then
targetBlock.SendAsync(fooResult);
});
//similar for aBlock2
Run Code Online (Sandbox Code Playgroud)
但是,当前代码的问题在于,如果在任何操作块中都Foo没有生成单个,则 targetBlock 可能不会收到任何内容FooResult。此外,targetBlock 可能接收 2 个FooResult对象,因为每个操作块都生成了一个FooResult.
我想要的是 targetBlock 接收一个包含每个对象的包装对象,Foo如果FooResult创建了对象,那么还有一个FooResult.
有什么想法可以使解决方案按照所描述的方式工作吗?它不必仔细阅读 TPL 数据流,但如果这样做了,它会很整洁。
更新:以下是我按照 svick 的建议通过实现 JoinBlock 得到的。我不会使用它(除非它可以在性能上进行调整),因为它运行速度非常慢,我每秒可以处理大约 89000 个项目(而且这只是 int 值类型)。
public class Test
{
private BroadcastBlock<int> broadCastBlock;
private TransformBlock<int, int> transformBlock1;
private TransformBlock<int, int> transformBlock2;
private JoinBlock<int, int, int> joinBlock;
private ActionBlock<Tuple<int, int, int>> processorBlock;
public Test()
{
broadCastBlock = new BroadcastBlock<int>(i =>
{
return i;
});
transformBlock1 = new TransformBlock<int, int>(i =>
{
return i;
}, new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = DataflowBlockOptions.Unbounded });
transformBlock2 = new TransformBlock<int, int>(i =>
{
return i;
}, new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = DataflowBlockOptions.Unbounded });
joinBlock = new JoinBlock<int, int, int>();
processorBlock = new ActionBlock<Tuple<int, int, int>>(tuple =>
{
//Console.WriteLine("original value: " + tuple.Item1 + "tfb1: " + tuple.Item2 + "tfb2: " + tuple.Item3);
}, new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = DataflowBlockOptions.Unbounded });
//Linking
broadCastBlock.LinkTo(transformBlock1, new DataflowLinkOptions { PropagateCompletion = true });
broadCastBlock.LinkTo(transformBlock2, new DataflowLinkOptions { PropagateCompletion = true });
broadCastBlock.LinkTo(joinBlock.Target1, new DataflowLinkOptions { PropagateCompletion = true });
transformBlock1.LinkTo(joinBlock.Target2, new DataflowLinkOptions { PropagateCompletion = true });
transformBlock2.LinkTo(joinBlock.Target3, new DataflowLinkOptions { PropagateCompletion = true });
joinBlock.LinkTo(processorBlock, new DataflowLinkOptions { PropagateCompletion = true });
}
public void Start()
{
Stopwatch watch = new Stopwatch();
watch.Start();
const int numElements = 1000000;
for (int i = 1; i <= numElements; i++)
{
broadCastBlock.Post(i);
}
////mark completion
broadCastBlock.Complete();
processorBlock.Completion.Wait();
watch.Stop();
Console.WriteLine("Time it took: " + watch.ElapsedMilliseconds + " - items processed per second: " + numElements / watch.ElapsedMilliseconds * 1000);
Console.ReadLine();
}
}
Run Code Online (Sandbox Code Playgroud)
更新代码以反映建议:
public Test()
{
broadCastBlock = new BroadcastBlock<int>(i =>
{
return i;
});
transformBlock1 = new TransformBlock<int, int>(i =>
{
return i;
});
transformBlock2 = new TransformBlock<int, int>(i =>
{
return i;
});
joinBlock = new JoinBlock<int, int>();
processorBlock = new ActionBlock<Tuple<int, int>>(tuple =>
{
//Console.WriteLine("tfb1: " + tuple.Item1 + "tfb2: " + tuple.Item2);
});
//Linking
broadCastBlock.LinkTo(transformBlock1, new DataflowLinkOptions { PropagateCompletion = true });
broadCastBlock.LinkTo(transformBlock2, new DataflowLinkOptions { PropagateCompletion = true });
transformBlock1.LinkTo(joinBlock.Target1);
transformBlock2.LinkTo(joinBlock.Target2);
joinBlock.LinkTo(processorBlock, new DataflowLinkOptions { PropagateCompletion = true });
}
public void Start()
{
Stopwatch watch = new Stopwatch();
watch.Start();
const int numElements = 1000000;
for (int i = 1; i <= numElements; i++)
{
broadCastBlock.Post(i);
}
////mark completion
broadCastBlock.Complete();
Task.WhenAll(transformBlock1.Completion, transformBlock2.Completion).ContinueWith(_ => joinBlock.Complete());
processorBlock.Completion.Wait();
watch.Stop();
Console.WriteLine("Time it took: " + watch.ElapsedMilliseconds + " - items processed per second: " + numElements / watch.ElapsedMilliseconds * 1000);
Console.ReadLine();
}
}
Run Code Online (Sandbox Code Playgroud)
我可以看到两种方法来解决这个问题:
使用JoinBlock。您的广播块和两个工作块将分别发送到连接块的一个目标。如果工作块没有任何结果,它将给出它null(或一些其他特殊值)。您的工作块将需要更改为TranformBlock<Foo, FooResult>,因为使用ActionBlock您所做的方式并不能保证排序(至少在您设置时不能保证MaxDegreeOfParallelism)TransformBlock。
的结果JoinBlock将是 a Tuple<Foo, FooResult, FooResult>,其中任何一个或两个FooResults 都可以是null。
尽管我不确定我是否喜欢这个解决方案,但它在很大程度上依赖于项目的正确排序,但这对我来说似乎很脆弱。
使用一些其他对象进行同步。当所有块都完成某个项目时,该对象将负责向前发送结果。NotificationWrapper这与马里奥在回答中的建议类似。
在这种情况下,您可以使用TaskCompletionSource和Task.WhenAll()来处理同步。