如何在不使用循环的情况下使用自己的度量创建距离矩阵?

Hrk*_*kkl 5 python numpy numpy-ndarray

我有一个这样的 np.arrray:

[[ 1.3 , 2.7 , 0.5 , NaN , NaN],
[ 2.0 , 8.9 , 2.5 , 5.6 , 3.5],
[ 0.6 , 3.4 , 9.5 , 7.4 , NaN]]
Run Code Online (Sandbox Code Playgroud)

还有一个函数来计算两行之间的距离:

def nan_manhattan(X, Y):
    nan_diff = np.absolute(X - Y)
    length = nan_diff.size
    return np.nansum(nan_diff) * length / (length - np.isnan(nan_diff).sum())
Run Code Online (Sandbox Code Playgroud)

我需要所有成对距离,而且我不想使用循环。我怎么做?

Div*_*kar 4

杠杆作用broadcasting——

\n
def manhattan_nan(a):\n    s = np.nansum(np.abs(a[:,None,:] - a), axis=-1)\n    m = ~np.isnan(a)\n    k = m.sum(1)\n    r = a.shape[1]/np.minimum.outer(k,k)\n    out = s*r\n    return out\n
Run Code Online (Sandbox Code Playgroud)\n

标杆管理

\n

从OP的评论来看,用例似乎是一个高大的数组。让我们使用给定的样本数据重现一个基准测试:

\n
In [2]: a\nOut[2]: \narray([[1.3, 2.7, 0.5, nan, nan],\n       [2. , 8.9, 2.5, 5.6, 3.5],\n       [0.6, 3.4, 9.5, 7.4, nan]])\n\nIn [3]: a = np.repeat(a, 100, axis=0)\n\n# @Dani Mesejo\'s soln\nIn [4]: %timeit pdist(a, nan_manhattan)\n1.02 s \xc2\xb1 35.7 ms per loop (mean \xc2\xb1 std. dev. of 7 runs, 1 loop each)\n\n# Naive for-loop version\nIn [18]: n = a.shape[0]\n\nIn [19]: %timeit [[nan_manhattan(a[i], a[j]) for i in range(j+1,n)] for j in range(n)]\n991 ms \xc2\xb1 45.6 ms per loop (mean \xc2\xb1 std. dev. of 7 runs, 1 loop each)\n\n# With broadcasting\nIn [9]: %timeit manhattan_nan(a)\n8.43 ms \xc2\xb1 49.9 \xc2\xb5s per loop (mean \xc2\xb1 std. dev. of 7 runs, 100 loops each)\n
Run Code Online (Sandbox Code Playgroud)\n