为什么C++ fseek/fread的性能比C#FileStream的Seek/Read要大

Dav*_*vid 5 c# c++ performance filestream

我做的很简单:

  1. 有一个大文件,其中包含大小为〜6Gb的随机二进制信息
  2. 算法循环"SeekCount"重复
  3. 每次重复都在执行以下操作:
    • 计算文件大小范围内的随机偏移量
    • 寻求抵消
    • 读取小块数据

C#:

    public static void Test()
    {
        string fileName = @"c:\Test\big_data.dat";
        int NumberOfSeeks = 1000;
        int MaxNumberOfBytes = 1;
        long fileLength = new FileInfo(fileName).Length;
        FileStream stream = new FileStream(fileName, FileMode.Open, FileAccess.Read, FileShare.Read, 65536, FileOptions.RandomAccess);
        Console.WriteLine("Processing file \"{0}\"", fileName);
        Random random = new Random();
        DateTime start = DateTime.Now;
        byte[] byteArray = new byte[MaxNumberOfBytes];

        for (int index = 0; index < NumberOfSeeks; ++index)
        {
            long offset = (long)(random.NextDouble() * (fileLength - MaxNumberOfBytes - 2));
            stream.Seek(offset, SeekOrigin.Begin);
            stream.Read(byteArray, 0, MaxNumberOfBytes);
        }

        Console.WriteLine(
            "Total processing time time {0} ms, speed {1} seeks/sec\r\n",
            DateTime.Now.Subtract(start).TotalMilliseconds, NumberOfSeeks / (DateTime.Now.Subtract(start).TotalMilliseconds / 1000.0));

        stream.Close();
    }
Run Code Online (Sandbox Code Playgroud)

然后在C++中进行相同的测试:

void test()
{
     FILE* file = fopen("c:\\Test\\big_data.dat", "rb");

char buf = 0;
__int64 fileSize = 6216672671;//ftell(file);
__int64 pos;

DWORD dwStart = GetTickCount();
for (int i = 0; i < kTimes; ++i)
{
    pos = (rand() % 100) * 0.01 * fileSize;
    _fseeki64(file, pos, SEEK_SET);
    fread((void*)&buf, 1 , 1,file);
}
DWORD dwEnd = GetTickCount() - dwStart;
printf(" - Raw Reading: %d times reading took %d ticks, e.g %d sec. Speed: %d items/sec\n", kTimes, dwEnd, dwEnd / CLOCKS_PER_SEC, kTimes / (dwEnd / CLOCKS_PER_SEC));
fclose(file);
}
Run Code Online (Sandbox Code Playgroud)

执行时间:

  1. C#:100-200读/秒
  2. C++:250 000读/秒(250千)

问题:为什么C++在文件读取这么简单的操作上比C#快几千倍?

附加信息:

  1. 我玩流缓冲区并将它们设置为相同的大小(4Kb)
  2. 磁盘碎片化(0%碎片)
  3. 操作系统配置:Windows 7,NTFS,一些最新的现代500Gb硬盘(如果正确调用WD),8 GB RAM(尽管几乎不使用),4核CPU(利用率几乎为零)

Dav*_*vid 6

测试的C++版本中存在错误 - 随机偏移的计算是有限的,因此只在短距离内进行搜索,这使得C++结果看起来更好.

@MooingDuck建议使用正确的代码来计算偏移量:

兰特()/双(RAND_MAX)*档案大小

随着这种变化,C++和C#的性能变得相当 - 大约200读/秒.

谢谢大家的贡献.