Pandas - 取当前行，将值与 X 前行进行比较并返回匹配项的数量（在 x% 范围内）

Question

Pandas - 取当前行，将值与 X 前行进行比较并返回匹配项的数量（在 x% 范围内）

swi*_*fty 4 python numpy dataframe pandas

我有一个像这样的熊猫专栏：

Run Code Online (Sandbox Code Playgroud)

我想搜索当前行值并从前几行中找到匹配项。例如， index4 (10.7) 将返回 1 匹配，因为它接近 index2 (10.8)。类似地，index8 (10.6) 将返回匹配项 2，因为它接近于 index2 和 4。

在此示例中使用 +/- 5% 的阈值将输出以下内容：

index colA  matches
1     10.2    0
2     10.8    0
3     11.6    0
4     10.7    2
5     9.5     0
6     6.2     0
7     12.9    0
8     10.6    3
9     6.4     1
10    20.5    0

Run Code Online (Sandbox Code Playgroud)

对于大型数据帧，我想将其限制为要搜索的前 X（300？）行数，而不是整个数据帧。

Answer 1

piR*_*red 5

使用三角形索引来确保我们只向后看。然后用于np.bincount累积匹配。

a = df.colA.values

i, j = np.tril_indices(len(a), -1)
mask = np.abs(a[i] - a[j]) / a[i] <= .05
df.assign(matches=np.bincount(i[mask], minlength=len(a)))

       colA  matches
index               
1      10.2        0
2      10.8        0
3      11.6        0
4      10.7        2
5       9.5        0
6       6.2        0
7      12.9        0
8      10.6        3
9       6.4        1
10     20.5        0

Run Code Online (Sandbox Code Playgroud)

如果您遇到资源问题，请考虑使用良好的时尚循环。但是，如果您可以访问，则可以numba大大加快速度。

from numba import njit

@njit
def counter(a):
    c = np.arange(len(a)) * 0
    for i, x in enumerate(a):
        for j, y in enumerate(a):
            if j < i:
                if abs(x - y) / x <= .05:
                    c[i] += 1
    return c

df.assign(matches=counter(a))

       colA  matches
index               
1      10.2        0
2      10.8        0
3      11.6        0
4      10.7        2
5       9.5        0
6       6.2        0
7      12.9        0
8      10.6        3
9       6.4        1
10     20.5        0

Run Code Online (Sandbox Code Playgroud)

归档时间：	7 年，7 月前
查看次数：	868 次
最近记录：	7 年，7 月前