numpy:检查一维数组是否为另一个的子数组

Ada*_*Er8 5 python numpy

给定两个通用的numpy 1-d数组(无论如何都不保证值),我需要检查一个数组是否为另一个的子数组。

通过转换为字符串可以很容易且简短,但可能不是最有效的:

import numpy as np

def is_sub_arr(a1, a2):
    return str(a2).strip('[]') in str(a1).strip('[]')

arr1 = np.array([9, 1, 3, 2, 7, 2, 7, 2, 8, 5])
arr2 = np.array([3, 2, 7, 2])
arr3 = np.array([1,3,7])

print(is_sub_arr(arr1,arr2))  # True
print(is_sub_arr(arr1,arr3))  # False
Run Code Online (Sandbox Code Playgroud)

有没有一种有效的内置/本地numpy方法来做到这一点?

jde*_*esa 3

编辑:您还可以使用 Numba 使用更少的内存使速度更快(例如 1000 倍):

\n\n
import numpy as np\nimport numba as nb\n\ndef is_sub_arr_np(a1, a2):\n    l1, = a1.shape\n    s1, = a1.strides\n    l2, = a2.shape\n    a1_win = np.lib.stride_tricks.as_strided(a1, (l1 - l2 + 1, l2), (s1, s1))\n    return np.any(np.all(a1_win == a2, axis=1))\n\n@nb.jit(parallel=True)\ndef is_sub_arr_nb(a1, a2):\n    for i in nb.prange(len(a1) - len(a2) + 1):\n        for j in range(len(a2)):\n            if a1[i + j] != a2[j]:\n                break\n        else:\n            return True\n    return False\n\n# Test\nnp.random.seed(0)\narr1 = np.random.randint(100, size=100_000)\narr2 = np.random.randint(100, size=1_000)\nprint(is_sub_arr_np(arr1, arr2))\n# False\nprint(is_sub_arr_nb(arr1, arr2))\n# False\n\n# Now enforce a match at the end\narr1[-len(arr2):] = arr2\nprint(is_sub_arr_np(arr1, arr2))\n# True\nprint(is_sub_arr_nb(arr1, arr2))\n# True\n\n# Timing\n%timeit is_sub_arr_np(arr1, arr2)\n# 99.4 ms \xc2\xb1 567 \xc2\xb5s per loop (mean \xc2\xb1 std. dev. of 7 runs, 10 loops each)\n%timeit is_sub_arr_nb(arr1, arr2)\n# 124 \xc2\xb5s \xc2\xb1 863 ns per loop (mean \xc2\xb1 std. dev. of 7 runs, 10000 loops each)\n
Run Code Online (Sandbox Code Playgroud)\n\n
\n\n

不确定这是最有效的答案,但这是一种可能的解决方案:

\n\n
import numpy as np\n\ndef is_sub_arr(a1, a2):\n    l1, = a1.shape\n    s1, = a1.strides\n    l2, = a2.shape\n    a1_win = np.lib.stride_tricks.as_strided(a1, (l1 - l2 + 1, l2), (s1, s1))\n    return np.any(np.all(a1_win == a2, axis=1))\n\narr1 = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])\narr2 = np.array([4, 5, 6])\narr3 = np.array([4, 5, 7])\nprint(is_sub_arr(arr1, arr2))\n# True\nprint(is_sub_arr(arr1, arr3))\n# False\n
Run Code Online (Sandbox Code Playgroud)\n