将生成 IEnumerable<T> 的 TransformBlock 链接到接收 T 的块

Ruh*_*iot 1 c# parallel-processing asynchronous tpl-dataflow

我正在编写一个网络画廊抓取工具,我希望尽可能使用 TPL 数据流并行处理文件。

为了抓取,我首先获取图库主页并解析 HTML 以获取图像页面链接作为列表。然后我转到列表中的每个页面并解析 HTML 以获取图像的链接,然后将其保存到磁盘。

这是我的计划的概要:

var galleryBlock = new TransformBlock<Uri, IEnumerable<Uri>>(async uri =>
{
    // 1. Get the page
    // 2. Parse the page to get the urls of each image page
    return imagePageLinks;
});

var imageBlock = new TransformBlock<Uri, Uri>(async uri =>
{
    // 1. Go to the url and fetch the image page html
    // 2. Parse the html to retrieve the image url
    return imageUri;
});

var downloadBlock = ActionBlock<Uri>(async uri =>
{
    // Download the image from uri to list
});

var opts = new DataflowLinkOptions { PropagateCompletion = true};
galleryBlock.LinkTo(imageBlock, opts); // this doesn't work, as I'm returning a list and not a single Item. However I want to progress that block in parallel.
imageBlock.LinkTo(downloadBlock, opts);
Run Code Online (Sandbox Code Playgroud)

spe*_*der 5

您可以使用 aTransformManyBlock代替您的TransformBlock

var galleryBlock = new TransformManyBlock<Uri, Uri>(async uri =>
{
    return Enumerable.Empty<Uri>(); //just to get it compiling
});

var imageBlock = new TransformBlock<Uri, Uri>(async uri =>
{
    return null;  //just to get it compiling
});

var opts = new DataflowLinkOptions { PropagateCompletion = true };
galleryBlock.LinkTo(imageBlock, opts); // bingo!
Run Code Online (Sandbox Code Playgroud)