byte []数组模式搜索

And*_*s R 68 c# pattern-matching

任何人都知道在byte []数组中搜索/匹配字节模式然后返回位置的有效方法.

例如

byte[] pattern = new byte[] {12,3,5,76,8,0,6,125};

byte[] toBeSearched = new byte[] {23,36,43,76,125,56,34,234,12,3,5,76,8,0,6,125,234,56,211,122,22,4,7,89,76,64,12,3,5,76,8,0,6,125}
Run Code Online (Sandbox Code Playgroud)

Jb *_*ain 53

我可以建议一些不涉及创建字符串,复制数组或不安全代码的东西:

using System;
using System.Collections.Generic;

static class ByteArrayRocks {

    static readonly int [] Empty = new int [0];

    public static int [] Locate (this byte [] self, byte [] candidate)
    {
        if (IsEmptyLocate (self, candidate))
            return Empty;

        var list = new List<int> ();

        for (int i = 0; i < self.Length; i++) {
            if (!IsMatch (self, i, candidate))
                continue;

            list.Add (i);
        }

        return list.Count == 0 ? Empty : list.ToArray ();
    }

    static bool IsMatch (byte [] array, int position, byte [] candidate)
    {
        if (candidate.Length > (array.Length - position))
            return false;

        for (int i = 0; i < candidate.Length; i++)
            if (array [position + i] != candidate [i])
                return false;

        return true;
    }

    static bool IsEmptyLocate (byte [] array, byte [] candidate)
    {
        return array == null
            || candidate == null
            || array.Length == 0
            || candidate.Length == 0
            || candidate.Length > array.Length;
    }

    static void Main ()
    {
        var data = new byte [] { 23, 36, 43, 76, 125, 56, 34, 234, 12, 3, 5, 76, 8, 0, 6, 125, 234, 56, 211, 122, 22, 4, 7, 89, 76, 64, 12, 3, 5, 76, 8, 0, 6, 125 };
        var pattern = new byte [] { 12, 3, 5, 76, 8, 0, 6, 125 };

        foreach (var position in data.Locate (pattern))
            Console.WriteLine (position);
    }
}
Run Code Online (Sandbox Code Playgroud)

编辑(通过IAbstract) - 移动帖子的内容,因为它不是答案

出于好奇,我用不同的答案创建了一个小基准.

以下是一百万次迭代的结果:

solution [Locate]:            00:00:00.7714027
solution [FindAll]:           00:00:03.5404399
solution [SearchBytePattern]: 00:00:01.1105190
solution [MatchBytePattern]:  00:00:03.0658212
Run Code Online (Sandbox Code Playgroud)

  • 你的解决方案在大字节数组上很慢. (3认同)
  • 你可以直接实现KMP算法,它的效率要高得多。 (2认同)

Yuj*_*are 25

使用LINQ方法.

public static IEnumerable<int> PatternAt(byte[] source, byte[] pattern)
{
    for (int i = 0; i < source.Length; i++)
    {
        if (source.Skip(i).Take(pattern.Length).SequenceEqual(pattern))
        {
            yield return i;
        }
    }
}
Run Code Online (Sandbox Code Playgroud)

非常简单!

  • 但不是特别有效,因此适用于大多数情况,但并非全部。 (2认同)

VVS*_*VVS 12

使用高效的Boyer-Moore算法.

它的目的是找到带字符串的字符串,但你需要很少的想象力将它投射到字节数组.

一般来说,最好的答案是:使用你喜欢的任何字符串搜索算法:).


GoC*_*ado 12

最初我发布了一些我用过的旧代码,但对Jb Evain的基准很好奇.我发现我的解决方案很愚蠢.似乎bruno conde的SearchBytePattern是最快的.我无法理解为什么特别是因为他使用了Array.Copy和Extension方法.但是在Jb的测试中有证据,所以对布鲁诺赞不绝口.

我进一步简化了比特,所以希望这将是最清晰,最简单的解决方案.(bruno conde所做的所有努力)增强功能包括:

  • Buffer.BlockCopy
  • Array.IndexOf <字节>
  • while循环而不是for循环
  • 启动索引参数
  • 转换为扩展方法

    public static List<int> IndexOfSequence(this byte[] buffer, byte[] pattern, int startIndex)    
    {
       List<int> positions = new List<int>();
       int i = Array.IndexOf<byte>(buffer, pattern[0], startIndex);  
       while (i >= 0 && i <= buffer.Length - pattern.Length)  
       {
          byte[] segment = new byte[pattern.Length];
          Buffer.BlockCopy(buffer, i, segment, 0, pattern.Length);    
          if (segment.SequenceEqual<byte>(pattern))
               positions.Add(i);
          i = Array.IndexOf<byte>(buffer, pattern[0], i + 1);
       }
       return positions;    
    }
    
    Run Code Online (Sandbox Code Playgroud)

  • 行"i = Array.IndexOf <byte>(buffer,pattern [0],i + pattern.Length)"可能应该是"i = Array.IndexOf <byte>(buffer,pattern [0],i + 1) ".就像现在一样,在找到第一个字符后跳过数据. (5认同)

Kev*_*oid 11

如果您使用的是 .NET Core 2.1 或更高版本(或者 .NET Standard 2.1 或更高版本平台),您可以将MemoryExtensions.IndexOf扩展方法与Span类型一起使用:

int matchIndex = toBeSearched.AsSpan().IndexOf(pattern);
Run Code Online (Sandbox Code Playgroud)

要查找所有出现的情况,您可以使用以下命令:

public static IEnumerable<int> IndexesOf(this byte[] haystack, byte[] needle,
    int startIndex = 0, bool includeOverlapping = false)
{
    int matchIndex = haystack.AsSpan(startIndex).IndexOf(needle);
    while (matchIndex >= 0)
    {
        yield return startIndex + matchIndex;
        startIndex += matchIndex + (includeOverlapping ? 1 : needle.Length);
        matchIndex = haystack.AsSpan(startIndex).IndexOf(needle);
    }
}
Run Code Online (Sandbox Code Playgroud)

从 .NET 7 开始(由于dotnet/runtime#63285),它使用优化的 SIMD 搜索算法在 中SpanHelpers.IndexOf)进行搜索。


bru*_*nde 7

我的解决方案

class Program
{
    public static void Main()
    {
        byte[] pattern = new byte[] {12,3,5,76,8,0,6,125};

        byte[] toBeSearched = new byte[] { 23, 36, 43, 76, 125, 56, 34, 234, 12, 3, 5, 76, 8, 0, 6, 125, 234, 56, 211, 122, 22, 4, 7, 89, 76, 64, 12, 3, 5, 76, 8, 0, 6, 125};

        List<int> positions = SearchBytePattern(pattern, toBeSearched);

        foreach (var item in positions)
        {
            Console.WriteLine("Pattern matched at pos {0}", item);
        }

    }

    static public List<int> SearchBytePattern(byte[] pattern, byte[] bytes)
    {
        List<int> positions = new List<int>();
        int patternLength = pattern.Length;
        int totalLength = bytes.Length;
        byte firstMatchByte = pattern[0];
        for (int i = 0; i < totalLength; i++)
        {
            if (firstMatchByte == bytes[i] && totalLength - i >= patternLength)
            {
                byte[] match = new byte[patternLength];
                Array.Copy(bytes, i, match, 0, patternLength);
                if (match.SequenceEqual<byte>(pattern))
                {
                    positions.Add(i);
                    i += patternLength - 1;
                }
            }
        }
        return positions;
    }
}
Run Code Online (Sandbox Code Playgroud)

  • 你不应该因为解决方案并不完美而给每个人一个-1 ...在这种情况下你应该投票给你认为最好的解决方案. (4认同)

Ing*_*hez 7

这是我的提议,更简单,更快捷:

int Search(byte[] src, byte[] pattern)
{
    int c = src.Length - pattern.Length + 1;
    int j;
    for (int i = 0; i < c; i++)
    {
        if (src[i] != pattern[0]) continue;
        for (j = pattern.Length - 1; j >= 1 && src[i + j] == pattern[j]; j--) ;
        if (j == 0) return i;
    }
    return -1;
}
Run Code Online (Sandbox Code Playgroud)

  • 一个死灵评论:您可能应该将“c”重命名为更好一些的名称 - 例如“maxFirstCharSlot”或其他名称。但这得到了我的+1 - 非常有用。 (2认同)
  • 虽然由于死灵而正在更新,但这绝对是一个令人惊叹的代码答案,您可以解释它是如何工作的或评论逻辑,这样高级成员就无法理解,我只知道这是做什么的,因为我的编程学位涵盖了建筑排序和搜索系统:D (2认同)

Mat*_*ten 5

我缺少 LINQ 方法/答案:-)

/// <summary>
/// Searches in the haystack array for the given needle using the default equality operator and returns the index at which the needle starts.
/// </summary>
/// <typeparam name="T">Type of the arrays.</typeparam>
/// <param name="haystack">Sequence to operate on.</param>
/// <param name="needle">Sequence to search for.</param>
/// <returns>Index of the needle within the haystack or -1 if the needle isn't contained.</returns>
public static IEnumerable<int> IndexOf<T>(this T[] haystack, T[] needle)
{
    if ((needle != null) && (haystack.Length >= needle.Length))
    {
        for (int l = 0; l < haystack.Length - needle.Length + 1; l++)
        {
            if (!needle.Where((data, index) => !haystack[l + index].Equals(data)).Any())
            {
                yield return l;
            }
        }
    }
}
Run Code Online (Sandbox Code Playgroud)


Eug*_*ota -2

您可以将字节数组放入String中并通过 IndexOf 运行匹配。或者您至少可以重用现有的字符串匹配算法。

    [STAThread]
    static void Main(string[] args)
    {
        byte[] pattern = new byte[] {12,3,5,76,8,0,6,125};
        byte[] toBeSearched = new byte[] {23,36,43,76,125,56,34,234,12,3,5,76,8,0,6,125,234,56,211,122,22,4,7,89,76,64,12,3,5,76,8,0,6,125};
        string needle, haystack;

        unsafe 
        {
            fixed(byte * p = pattern) {
                needle = new string((SByte *) p, 0, pattern.Length);
            } // fixed

            fixed (byte * p2 = toBeSearched) 
            {
                haystack = new string((SByte *) p2, 0, toBeSearched.Length);
            } // fixed

            int i = haystack.IndexOf(needle, 0);
            System.Console.Out.WriteLine(i);
        }
    }
Run Code Online (Sandbox Code Playgroud)