Hrk*_*kkl 5 python numpy numpy-ndarray
我有一个这样的 np.arrray:
[[ 1.3 , 2.7 , 0.5 , NaN , NaN],
[ 2.0 , 8.9 , 2.5 , 5.6 , 3.5],
[ 0.6 , 3.4 , 9.5 , 7.4 , NaN]]
Run Code Online (Sandbox Code Playgroud)
还有一个函数来计算两行之间的距离:
def nan_manhattan(X, Y):
nan_diff = np.absolute(X - Y)
length = nan_diff.size
return np.nansum(nan_diff) * length / (length - np.isnan(nan_diff).sum())
Run Code Online (Sandbox Code Playgroud)
我需要所有成对距离,而且我不想使用循环。我怎么做?
杠杆作用broadcasting——
def manhattan_nan(a):\n s = np.nansum(np.abs(a[:,None,:] - a), axis=-1)\n m = ~np.isnan(a)\n k = m.sum(1)\n r = a.shape[1]/np.minimum.outer(k,k)\n out = s*r\n return out\nRun Code Online (Sandbox Code Playgroud)\n从OP的评论来看,用例似乎是一个高大的数组。让我们使用给定的样本数据重现一个基准测试:
\nIn [2]: a\nOut[2]: \narray([[1.3, 2.7, 0.5, nan, nan],\n [2. , 8.9, 2.5, 5.6, 3.5],\n [0.6, 3.4, 9.5, 7.4, nan]])\n\nIn [3]: a = np.repeat(a, 100, axis=0)\n\n# @Dani Mesejo\'s soln\nIn [4]: %timeit pdist(a, nan_manhattan)\n1.02 s \xc2\xb1 35.7 ms per loop (mean \xc2\xb1 std. dev. of 7 runs, 1 loop each)\n\n# Naive for-loop version\nIn [18]: n = a.shape[0]\n\nIn [19]: %timeit [[nan_manhattan(a[i], a[j]) for i in range(j+1,n)] for j in range(n)]\n991 ms \xc2\xb1 45.6 ms per loop (mean \xc2\xb1 std. dev. of 7 runs, 1 loop each)\n\n# With broadcasting\nIn [9]: %timeit manhattan_nan(a)\n8.43 ms \xc2\xb1 49.9 \xc2\xb5s per loop (mean \xc2\xb1 std. dev. of 7 runs, 100 loops each)\nRun Code Online (Sandbox Code Playgroud)\n