web*_*orm 2 c# streamwriter streamreader
我编写了一个Winform应用程序,它读取文本文件的每一行,使用行上的RegEx进行搜索和替换,然后将其写回新文件.我选择了"逐行"方法,因为有些文件太大而无法加载到内存中.
我正在使用BackgroundWorker对象,因此可以使用作业的进度更新UI.下面是代码(为简洁起见省略了部分),它处理读取然后输出文件中的行.
public void bgWorker_DoWork(object sender, DoWorkEventArgs e)
{
// Details of obtaining file paths omitted for brevity
int totalLineCount = File.ReadLines(inputFilePath).Count();
using (StreamReader sr = new StreamReader(inputFilePath))
{
int currentLine = 0;
String line;
while ((line = sr.ReadLine()) != null)
{
currentLine++;
// Match and replace contents of the line
// omitted for brevity
if (currentLine % 100 == 0)
{
int percentComplete = (currentLine * 100 / totalLineCount);
bgWorker.ReportProgress(percentComplete);
}
using (FileStream fs = new FileStream(outputFilePath, FileMode.Append, FileAccess.Write))
using (StreamWriter sw = new StreamWriter(fs))
{
sw.WriteLine(line);
}
}
}
}
Run Code Online (Sandbox Code Playgroud)
我正在处理的一些文件非常大(8 GB,1.32亿行).该过程需要很长时间(2 GB文件需要大约9个小时才能完成).它看起来以大约58 KB /秒的速度运行.这是预期还是应该更快?
Sco*_*ain 14
不要在每次循环迭代时关闭并重新打开写入文件,只需在文件循环外打开编写器.这应该可以提高性能,因为编写器不再需要在每次循环迭代中寻找文件的末尾.
还会File.ReadLines(inputFilePath).Count(); 导致您两次读取输入文件,这可能是一大块时间.而不是基于行的百分比计算基于流位置的百分比.
public void bgWorker_DoWork(object sender, DoWorkEventArgs e)
{
// Details of obtaining file paths omitted for brevity
using (StreamWriter sw = new StreamWriter(outputFilePath, true)) //You can use this constructor instead of FileStream, it does the same operation.
using (StreamReader sr = new StreamReader(inputFilePath))
{
int lastPercentage = 0;
String line;
while ((line = sr.ReadLine()) != null)
{
// Match and replace contents of the line
// omitted for brevity
//Poisition and length are longs not ints so we need to cast at the end.
int currentPercentage = (int)(sr.BaseStream.Position * 100L / sr.BaseStream.Length);
if (lastPercentage != currentPercentage )
{
bgWorker.ReportProgress(currentPercentage );
lastPercentage = currentPercentage;
}
sw.WriteLine(line);
}
}
}
Run Code Online (Sandbox Code Playgroud)
除此之外,你需要展示Match and replace contents of the line omitted for brevity我猜的是你的慢慢来自哪里.在您的代码上运行一个分析器,看看它花费的时间最多,并集中精力在那里.