如何在不在C#中逐行搜索字符串的大文本文件中搜索？

Question

如何在不在C#中逐行搜索字符串的大文本文件中搜索？

13 c# search text

我有一个大文本文件,我需要搜索特定的字符串.有没有一种快速的方法可以在不逐行阅读的情况下做到这一点？

由于文件的大小(超过100 MB),此方法非常慢.

Answer 1

Way*_*ish 7

鉴于文件的大小,你真的想要事先将它们完全读入内存吗？逐行可能是最好的方法.

Answer 2

DRB*_*ise 5

这是我的解决方案，它使用流一次读取一个字符。我创建了一个自定义类来一次搜索一个字符的值，直到找到整个值。

我使用保存在网络驱动器上的 100MB 文件进行了一些测试，速度完全取决于它读取文件的速度。如果文件在 Windows 中缓冲，则搜索整个文件所需的时间不到 3 秒。否则，可能需要 7 秒到 60 秒不等，具体取决于网络速度。

如果针对内存中的字符串运行并且没有匹配的字符，则搜索本身花费的时间不到一秒。如果找到的许多主要字符都匹配，则搜索可能需要更长的时间。

public static int FindInFile(string fileName, string value)
{   // returns complement of number of characters in file if not found
    // else returns index where value found
    int index = 0;
    using (System.IO.StreamReader reader = new System.IO.StreamReader(fileName))
    {
        if (String.IsNullOrEmpty(value))
            return 0;
        StringSearch valueSearch = new StringSearch(value);
        int readChar;
        while ((readChar = reader.Read()) >= 0)
        {
            ++index;
            if (valueSearch.Found(readChar))
                return index - value.Length;
        }
    }
    return ~index;
}
public class StringSearch
{   // Call Found one character at a time until string found
    private readonly string value;
    private readonly List<int> indexList = new List<int>();
    public StringSearch(string value)
    {
        this.value = value;
    }
    public bool Found(int nextChar)
    {
        for (int index = 0; index < indexList.Count; )
        {
            int valueIndex = indexList[index];
            if (value[valueIndex] == nextChar)
            {
                ++valueIndex;
                if (valueIndex == value.Length)
                {
                    indexList[index] = indexList[indexList.Count - 1];
                    indexList.RemoveAt(indexList.Count - 1);
                    return true;
                }
                else
                {
                    indexList[index] = valueIndex;
                    ++index;
                }
            }
            else
            {   // next char does not match
                indexList[index] = indexList[indexList.Count - 1];
                indexList.RemoveAt(indexList.Count - 1);
            }
        }
        if (value[0] == nextChar)
        {
            if (value.Length == 1)
                return true;
            indexList.Add(1);
        }
        return false;
    }
    public void Reset()
    {
        indexList.Clear();
    }
}

Run Code Online (Sandbox Code Playgroud)

归档时间：	15 年，9 月前
查看次数：	33944 次
最近记录：	10 年，4 月前