尽快打开并读取数千个文件

cki*_*ndt 2 c# io hl7 asp.net-mvc-4

我需要尽快打开并读取数千个文件。

我对13 592个文件进行了一些测试,发现方法1略快于方法2。这些文件通常在800字节和4kB之间。我想知道是否可以做些什么来使此I / O绑定过程更快?

Method 1:
    Run 1: 3:05 (don't know what happened here)
    Run 2: 1:55
    Run 3: 2:06
    Run 4: 2:02
Method 2:
    Run 1: 2:04
    Run 2: 2:08
    Run 3: 2:04
    Run 4: 2:12
Run Code Online (Sandbox Code Playgroud)

这是代码:

public class FileOpenerUtil
{

    /// <summary>
    /// 
    /// </summary>
    /// <param name="fullFilePath"></param>
    /// <returns></returns>
    public static string ReadFileToString(string fullFilePath)
    {
        while (true)
        {
            try
            {
                //Methode 1
                using (StreamReader sr = File.OpenText(fullFilePath))
                {
                    string fullMessage = "";
                    string s;
                    while ((s = sr.ReadLine()) != null)
                    {
                        fullMessage += s + "\n";
                    }
                    return RemoveCarriageReturn(fullMessage);
                }
                //Methode 2
                /*using (File.Open(fullFilePath, FileMode.Open, FileAccess.Read, FileShare.Read))
                {
                    Console.WriteLine("Output file {0} ready.", fullFilePath);
                    string[] lines = File.ReadAllLines(fullFilePath);
                    //Every new line under the previous line
                    string fullMessage = lines.Aggregate("", (current, s) => current + s + "\n");
                    return RemoveCarriageReturn(fullMessage);
                    //ninject kernel


                }*/
                //Methode 3

            }
            catch (FileNotFoundException ex)
            {
                Console.WriteLine("Output file {0} not yet ready ({1})", fullFilePath, ex.Message);
            }
            catch (IOException ex)
            {
                Console.WriteLine("Output file {0} not yet ready ({1})", fullFilePath, ex.Message);
            }
            catch (UnauthorizedAccessException ex)
            {
                Console.WriteLine("Output file {0} not yet ready ({1})", fullFilePath, ex.Message);
            }
        }

    }

    /// <summary>
    /// Verwijdert '\r' in een string sequence
    /// </summary>
    /// <param name="message">The text that has to be changed</param>
    /// <returns>The changed text</returns>
    private static string RemoveCarriageReturn(string message)
    {
        return message.Replace("\r", "");
    }
}
Run Code Online (Sandbox Code Playgroud)

我正在读取的文件是.HL7文件,看起来像这样:

MSH | ^〜\&| OAZIS |||| 20150430235954 || ADT ^ A03 | 23669166 | P | 2.3 |||||| ASCII EVN | A03 | 20150430235954 |||| 201504302359 PID | 1 || 6001144000 ||姓^ FirstName ^^^ Mevr。| LastName ^ FirstName | 19600114 | F ||| GStreetName Number ^^ City ^^ PostalCode ^ B ^ H || 09/3444556 ^^ PH〜0476519246echtg ^^ CP || NL | M || 28783409 ^^^^ VN | 0000000000 | 60011402843 |||||| B |||| N PD1 |||| 003847 ^ LastName ^ FirstName |||||||| N ||| 0 PV1 | 1 | O | FDAG ^ 000 ^ 053 ^ 001 ^ 0 ^ 2 | NULL || FDAG ^ 000 ^ 053 ^ 001 | 003847 ^姓氏^名字|| 006813 ^姓氏^名字| 1900 | 00 |||||| 006813 ^姓氏^名| 0 | 28783409 ^^^^ VN | 1 ^ 20150430 | 01 |||||||||||||||| 1 | 1 || D |||||| 201504301336 | 201504302359 OBX | 1 | CE | KIND_OF_DIS | RCM | 1 ^ 1 Op媒体通知OBX | 2 | CE | DESTINATION_DIS | RCM | 1 ^ 1 Terug naar huis

打开文件后,我将使用j4jayant的HL7解析器解析该字符串并关闭文件。

Ant*_*ony 6

我使用了50,000个大小不同的文件(500到1024个字节)。

测试1:您的方法1 StreamReader sr = File.OpenText(fullFilePath); sr.ReadLine();
秒:3,4658937968113
测试2:您的方法2 File.ReadAllLines(fullFilePath)
秒:5,5008349279222
测试3File.ReadAllText(fullFilePath);
秒:3,30782645637133
测试4BinaryReader b = new BinaryReader; b.ReadString();
秒:5,85779941381009
测试5Windows FileReaderhttps://msdn.microsoft。 com / en-us / library / 2d9wy99d.aspx
秒:3,07036554759848
测试6StreamReader sr = File.OpenText(fullFilePath); sr.ReadToEnd();
秒:3,31464109255517
测试7StreamReader sr = File.OpenText(fullFilePath); sr.ReadToEnd();
秒:3,3364683664508
测试8StreamReader sr = File.OpenText(fullFilePath); sr.ReadLine();
秒:3,40426888695317
测试9:FileStream + BufferedStream + StreamReader
Seconds :4,02871911079061
测试10Parallel.For using code File.ReadAllText(fullFilePath);
秒:0,89543632235447

最佳测试结果是Test 5Test 3(单线程)
Test 3正在使用:File.ReadAllText(fullFilePath);
Test 5使用Windows FileReaderhttps://msdn.microsoft.com/zh-cn/library/2d9wy99d.aspx

如果可以使用线程,则测试10最快。

例:

int maxFiles = 50000;
int j = 0;
Parallel.For(0, maxFiles, x =>
{
    Util.Method1("readtext_" + j + ".txt"); // your read method
    j++;
});
Run Code Online (Sandbox Code Playgroud)


使用RAMMap清空备用列表时:

测试1:您的方法1 StreamReader sr = File.OpenText(fullFilePath); sr.ReadLine();
秒:15,1785750622961
测试2:您的方法2 File.ReadAllLines(fullFilePath)
秒:17,650864469466
测试3File.ReadAllText(fullFilePath);
秒:14,8985912878328
测试4BinaryReader b = new BinaryReader; b.ReadString();
秒:18,1603815767866
测试5Windows FileReader
秒:14,5059765845334
测试6StreamReader sr = File.OpenText(fullFilePath); sr.ReadToEnd();
秒:14,8649786336991
测试7StreamReader sr = File.OpenText(fullFilePath); sr.ReadToEnd();
秒:14,830567197641
测试8StreamReader sr = File.OpenText(fullFilePath); sr.ReadLine();
秒:14,9965866575751
测试9:FileStream + BufferedStream + StreamReader
秒:15,7336450516575
测试10Parallel.For() using code File.ReadAllText(fullFilePath);
秒:4,11343060325439

  • 需要注意的一点是:如果没有其他变化,并且使用多线程加快数据读取速度,则数据读取不会在 IO 上遇到瓶颈。我还尝试在清除所有缓存后重新运行您的 **测试 10**:http://stackoverflow.com/questions/478340/clear-file-cache-to-repeat-performance-testing (2认同)