use*_*200 5 python numpy matrix vectorization duplicates
我有一个矩阵a.shape: (80000, 38, 38).我要检查,看看是否有任何重复或类似(38,38)沿着第一维度的矩阵(在这种情况下,有这些矩阵80000).
我可以通过两个for循环:
for i in range(a.shape[0]):
for g in range(a.shape[0]):
if a[i,:,:] - a[g,:,:] < tolerance:
# save the index here
Run Code Online (Sandbox Code Playgroud)
但这看起来非常低效.我知道有numpy.unique,但是当我有一组二维矩阵时,我不确定我是否理解它是如何工作的.
建议有效地做到这一点?有没有办法让广播找到所有矩阵中所有元素的差异?
这是一种使用方法lex-sorting-
# Reshape a to a 2D as required in few places later on
ar = a.reshape(a.shape[0],-1)
# Get lex-sorted indices
sortidx = np.lexsort(ar.T)
# Lex-sort reshaped array to bring duplicate rows next to each other.
# Perform differentiation to check for rows that have at least one non-zero
# as those represent unique rows and as such those are unique blocks
# in axes(1,2) for the original 3D array
out = a[sortidx][np.append(True,(np.diff(ar[sortidx],axis=0)!=0).any(1))]
Run Code Online (Sandbox Code Playgroud)
这是另一种方法,将每个元素块axes=(1,2)作为索引元组,以找出其他块中的唯一性 -
# Reshape a to a 2D as required in few places later on
ar = a.reshape(a.shape[0],-1)
# Get dimension shape considering each block in axes(1,2) as an indexing tuple
dims = np.append(1,(ar[:,:-1].max(0)+1).cumprod())
# Finally get unique indexing tuples' indices that represent unique
# indices along first axis for indexing into input array and thus get
# the desired output of unique blocks along the axes(1,2)
out = a[np.unique(ar.dot(dims),return_index=True)[1]]
Run Code Online (Sandbox Code Playgroud)
样品运行 -
1]输入:
In [151]: a
Out[151]:
array([[[12, 4],
[ 0, 1]],
[[ 2, 4],
[ 3, 2]],
[[12, 4],
[ 0, 1]],
[[ 3, 4],
[ 1, 3]],
[[ 2, 4],
[ 3, 2]],
[[ 3, 0],
[ 2, 1]]])
Run Code Online (Sandbox Code Playgroud)
2]输出:
In [152]: ar = a.reshape(a.shape[0],-1)
...: sortidx = np.lexsort(ar.T)
...:
In [153]: a[sortidx][np.append(True,(np.diff(ar[sortidx],axis=0)!=0).any(1))]
Out[153]:
array([[[12, 4],
[ 0, 1]],
[[ 3, 0],
[ 2, 1]],
[[ 2, 4],
[ 3, 2]],
[[ 3, 4],
[ 1, 3]]])
In [154]: dims = np.append(1,(ar[:,:-1].max(0)+1).cumprod())
In [155]: a[np.unique(ar.dot(dims),return_index=True)[1]]
Out[155]:
array([[[12, 4],
[ 0, 1]],
[[ 3, 0],
[ 2, 1]],
[[ 2, 4],
[ 3, 2]],
[[ 3, 4],
[ 1, 3]]])
Run Code Online (Sandbox Code Playgroud)
对于相似性标准,假设你的意思是绝对值(a[i,:,:] - a[g,:,:]).all() < tolerance,这里是一个矢量化的方法来获得axes(1,2)输入数组中所有相似块的索引-
R,C = np.triu_indices(a.shape[0],1)
mask = (np.abs(a[R] - a[C]) < tolerance).all(axis=(1,2))
I,G = R[mask], C[mask]
Run Code Online (Sandbox Code Playgroud)
样品运行 -
In [267]: a
Out[267]:
array([[[12, 4],
[ 0, 1]],
[[ 2, 4],
[ 3, 2]],
[[13, 4],
[ 0, 1]],
[[ 3, 4],
[ 1, 3]],
[[ 2, 4],
[ 3, 2]],
[[12, 5],
[ 1, 1]]])
In [268]: tolerance = 2
In [269]: R,C = np.triu_indices(a.shape[0],1)
...: mask = (np.abs(a[R] - a[C]) < tolerance).all(axis=(1,2))
...: I,G = R[mask], C[mask]
...:
In [270]: I
Out[270]: array([0, 0, 1, 2])
In [271]: G
Out[271]: array([2, 5, 4, 5])
Run Code Online (Sandbox Code Playgroud)