如何有效地拆分大文件

Question

如何有效地拆分大文件

我想知道如何在不使用太多系统资源的情况下拆分大文件.我目前正在使用此代码:

public static void SplitFile(string inputFile, int chunkSize, string path)
{
    byte[] buffer = new byte[chunkSize];

    using (Stream input = File.OpenRead(inputFile))
    {
        int index = 0;
        while (input.Position < input.Length)
        {
            using (Stream output = File.Create(path + "\\" + index))
            {
                int chunkBytesRead = 0;
                while (chunkBytesRead < chunkSize)
                {
                    int bytesRead = input.Read(buffer, 
                                               chunkBytesRead, 
                                               chunkSize - chunkBytesRead);

                    if (bytesRead == 0)
                    {
                        break;
                    }
                    chunkBytesRead += bytesRead;
                }
                output.Write(buffer, 0, chunkBytesRead);
            }
            index++;
        }
    }
}

Run Code Online (Sandbox Code Playgroud)

该操作需要52.370秒才能将1.6GB文件拆分为14mb文件.我不关心操作需要多长时间,我更关心使用的系统资源,因为这个应用程序将部署到共享托管环境.目前,此操作最大化了我的系统HDD IO使用率100%,并大大减慢了我的系统速度.CPU使用率低; RAM略微上升,但看起来很好.

有没有办法可以限制此操作使用太多资源？

谢谢

Answer 1

Mar*_*ell 28

将每个输出文件组装在内存中似乎很奇怪; 我怀疑你应该运行一个内部缓冲区(可能是20k或其他东西)并且Write更频繁地调用.

最终,如果您需要IO,则需要IO.如果你想对共享的托管环境保持礼貌,你可以添加故意的暂停 - 可能是内循环中的短暂暂停,以及外循环中的较长暂停(可能是1s).这不会对您的整体时间造成太大影响,但可能有助于其他进程获得一些IO.

内循环缓冲区示例:

public static void SplitFile(string inputFile, int chunkSize, string path)
{
    const int BUFFER_SIZE = 20 * 1024;
    byte[] buffer = new byte[BUFFER_SIZE];

    using (Stream input = File.OpenRead(inputFile))
    {
        int index = 0;
        while (input.Position < input.Length)
        {
            using (Stream output = File.Create(path + "\\" + index))
            {
                int remaining = chunkSize, bytesRead;
                while (remaining > 0 && (bytesRead = input.Read(buffer, 0,
                        Math.Min(remaining, BUFFER_SIZE))) > 0)
                {
                    output.Write(buffer, 0, bytesRead);
                    remaining -= bytesRead;
                }
            }
            index++;
            Thread.Sleep(500); // experimental; perhaps try it
        }
    }
}

Run Code Online (Sandbox Code Playgroud)

Answer 2

Mic*_*hig 5

我已经稍微修改了问题中的代码，以防您想按块分割，同时确保每个块都以一行结尾：

    private static void SplitFile(string inputFile, int chunkSize, string path)
    {
        byte[] buffer = new byte[chunkSize];
        List<byte> extraBuffer = new List<byte>();

        using (Stream input = File.OpenRead(inputFile))
        {
            int index = 0;
            while (input.Position < input.Length)
            {
                using (Stream output = File.Create(path + "\\" + index + ".csv"))
                {
                    int chunkBytesRead = 0;
                    while (chunkBytesRead < chunkSize)
                    {
                        int bytesRead = input.Read(buffer,
                                                   chunkBytesRead,
                                                   chunkSize - chunkBytesRead);

                        if (bytesRead == 0)
                        {
                            break;
                        }

                        chunkBytesRead += bytesRead;
                    }

                    byte extraByte = buffer[chunkSize - 1];
                    while (extraByte != '\n')
                    {
                        int flag = input.ReadByte();
                        if (flag == -1)
                            break;
                        extraByte = (byte)flag;
                        extraBuffer.Add(extraByte);
                    }

                    output.Write(buffer, 0, chunkBytesRead);
                    if (extraBuffer.Count > 0)
                        output.Write(extraBuffer.ToArray(), 0, extraBuffer.Count);

                    extraBuffer.Clear();
                }
                index++;
            }
        }
    }

Run Code Online (Sandbox Code Playgroud)

归档时间：	15 年，2 月前
查看次数：	33010 次
最近记录：	7 年，5 月前