use*_*412 3 c# binary performance casting
我有一个包含整数的巨大文件(约 20 GB),想用 C# 读取它们。
将文件读取到内存(字节数组)非常快(使用 SSD,整个文件适合内存)。但是,当我使用二进制读取器(通过内存流)读取这些字节时,ReadInt32 方法比将文件读取到内存所需的时间要长得多。我预计磁盘 IO 是瓶颈,但事实是转换!
有没有一种方法可以直接将整个字节数组转换为 int 数组,而不必使用 ReadInt32 方法将其一一转换?
class Program
{
static int size = 256 * 1024 * 1024;
static string filename = @"E:\testfile";
static void Main(string[] args)
{
Write(filename, size);
int[] result = Read(filename, size);
Console.WriteLine(result.Length);
}
static void Write(string filename, int size)
{
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
BinaryWriter bw = new BinaryWriter(new FileStream(filename, FileMode.Create), Encoding.UTF8);
for (int i = 0; i < size; i++)
{
bw.Write(i);
}
bw.Close();
stopwatch.Stop();
Console.WriteLine(String.Format("File written in {0}ms", stopwatch.ElapsedMilliseconds));
}
static int[] Read(string filename, int size)
{
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
byte[] buffer = File.ReadAllBytes(filename);
BinaryReader br = new BinaryReader(new MemoryStream(buffer), Encoding.UTF8);
stopwatch.Stop();
Console.WriteLine(String.Format("File read into memory in {0}ms", stopwatch.ElapsedMilliseconds));
stopwatch.Reset();
stopwatch.Start();
int[] result = new int[size];
for (int i = 0; i < size; i++)
{
result[i] = br.ReadInt32();
}
br.Close();
stopwatch.Stop();
Console.WriteLine(String.Format("Byte-array casted to int-array in {0}ms", stopwatch.ElapsedMilliseconds));
return result;
}
}
Run Code Online (Sandbox Code Playgroud)
您可以分配一个byte[]大小合适的临时缓冲区,并使用该Buffer.BlockCopy方法将字节增量复制到int[]数组。
BinaryReader reader = ...;
int[] hugeIntArray = ...;
const int TempBufferSize = 4 * 1024 * 1024;
byte[] tempBuffer = reader.ReadBytes(TempBufferSize);
Buffer.BlockCopy(tempBuffer, 0, hugeIntArray, offset, TempBufferSize);
Run Code Online (Sandbox Code Playgroud)
其中offset是目标数组中的当前(对于当前迭代)起始索引hugeIntArray。
| 归档时间: |
|
| 查看次数: |
2834 次 |
| 最近记录: |