我需要从SEC网站下载大约200万个文件.每个文件都有一个唯一的URL,平均为10kB.这是我目前的实施:
List<string> urls = new List<string>();
// ... initialize urls ...
WebBrowser browser = new WebBrowser();
foreach (string url in urls)
{
browser.Navigate(url);
while (browser.ReadyState != WebBrowserReadyState.Complete) Application.DoEvents();
StreamReader sr = new StreamReader(browser.DocumentStream);
StreamWriter sw = new StreamWriter(), url.Substring(url.LastIndexOf('/')));
sw.Write(sr.ReadToEnd());
sr.Close();
sw.Close();
}
Run Code Online (Sandbox Code Playgroud)
预计的时间大约是12天......有更快的方法吗?
编辑:顺便说一句,本地文件处理只占7%的时间
编辑:这是我最后的实现:
void Main(void)
{
ServicePointManager.DefaultConnectionLimit = 10000;
List<string> urls = new List<string>();
// ... initialize urls ...
int retries = urls.AsParallel().WithDegreeOfParallelism(8).Sum(arg => downloadFile(arg));
}
public int downloadFile(string url)
{
int retries = 0; …Run Code Online (Sandbox Code Playgroud) 我想用C#并行下载文件.为此,我编写了这个代码,它运行得很好,但问题是UI很冷.
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Net;
using System.Text;
using System.Text.RegularExpressions;
using System.Threading;
using System.Threading.Tasks;
using System.Windows;
using System.Windows.Controls;
using System.Windows.Data;
using System.Windows.Documents;
using System.Windows.Input;
using System.Windows.Media;
using System.Windows.Media.Imaging;
using System.Windows.Navigation;
using System.Windows.Shapes;
using System.Windows.Threading;
namespace FileDownloader
{
/// <summary>
/// Interaction logic for MainWindow.xaml
/// </summary>
public partial class MainWindow : Window
{
private static int count = 1;
private static string f= "lecture";
private string URL = "www.someexample.com";
public MainWindow()
{
InitializeComponent();
}
public …Run Code Online (Sandbox Code Playgroud)