Boyer-Moore在C#中的实用性?

Jon*_*ood 29 .net c# algorithm performance boyer-moore

Boyer-Moore可能是已知的最快的非索引文本搜索算法.所以我在C#中为我的Black Belt Coder网站实现它.

我有它工作,它显示大致预期的性能改进相比String.IndexOf().但是,当我添加StringComparison.Ordinal参数时IndexOf,它开始优于我的Boyer-Moore实现.有时,相当数量.

我想知道是否有人可以帮助我找出原因.我理解为什么StringComparision.Ordinal可能加快速度,但它怎么能比Boyer-Moore更快?是因为.NET平台本身的开销,可能是因为必须验证数组索引以确保它们在范围内,或者完全不同.有些算法在C#.NET中不实用吗?

以下是关键代码.

// Base for search classes
abstract class SearchBase
{
    public const int InvalidIndex = -1;
    protected string _pattern;
    public SearchBase(string pattern) { _pattern = pattern; }
    public abstract int Search(string text, int startIndex);
    public int Search(string text) { return Search(text, 0); }
}

/// <summary>
/// A simplified Boyer-Moore implementation.
/// 
/// Note: Uses a single skip array, which uses more memory than needed and
/// may not be large enough. Will be replaced with multi-stage table.
/// </summary>
class BoyerMoore2 : SearchBase
{
    private byte[] _skipArray;

    public BoyerMoore2(string pattern)
        : base(pattern)
    {
        // TODO: To be replaced with multi-stage table
        _skipArray = new byte[0x10000];

        for (int i = 0; i < _skipArray.Length; i++)
            _skipArray[i] = (byte)_pattern.Length;
        for (int i = 0; i < _pattern.Length - 1; i++)
            _skipArray[_pattern[i]] = (byte)(_pattern.Length - i - 1);
    }

    public override int Search(string text, int startIndex)
    {
        int i = startIndex;

        // Loop while there's still room for search term
        while (i <= (text.Length - _pattern.Length))
        {
            // Look if we have a match at this position
            int j = _pattern.Length - 1;
            while (j >= 0 && _pattern[j] == text[i + j])
                j--;

            if (j < 0)
            {
                // Match found
                return i;
            }

            // Advance to next comparision
            i += Math.Max(_skipArray[text[i + j]] - _pattern.Length + 1 + j, 1);
        }
        // No match found
        return InvalidIndex;
    }
}
Run Code Online (Sandbox Code Playgroud)

编辑:我已经在http://www.blackbeltcoder.com/Articles/algorithms/fast-text-search-with-boyer-moore上发布了我的所有测试代码和结论.

Jon*_*ood 19

根据我自己的测试和这里的评论,我得出结论,原因String.IndexOf()很好,StringComparision.Ordinal因为该方法调用了非托管代码,可能采用手工优化的汇编语言.

我已经运行了许多不同的测试,String.IndexOf()似乎比使用托管C#代码实现的任何东西都要快.

如果有人感兴趣,我已经写了我发现的关于此的一切,并在C#中发布了Boyer-Moore算法的几种变体,网址http://www.blackbeltcoder.com/Articles/algorithms/fast-text-search-with-博耶摩尔.