我有 n 个相同大小的矩阵,想查看所有矩阵中有多少个单元格彼此相等。代码:
import numpy as np
a = np.array([[1,2,3],[4,5,6],[7,8,9]])
b = np.array([[5,6,7], [4,2,6], [7, 8, 9]])
c = np.array([2,3,4],[4,5,6],[1,2,5])
#Intuition is below but is wrong
a == b == c
Run Code Online (Sandbox Code Playgroud)
如何让 Python 返回值 2(单元格 2,1 和 2,3 在所有 3 个矩阵中匹配)或 [[False, False, False], [True, False, True], [False,错,错]]?
You can do:
(a == b) & (b==c)
[[False False False]
[ True False True]
[False False False]]
Run Code Online (Sandbox Code Playgroud)
For n
items in, say, a list like x=[a, b, c, a, b, c]
, one could do:
r = x[0] == x[1]
for temp in x[2:]:
r &= x[0]==temp
Run Code Online (Sandbox Code Playgroud)
The result in now in r
.
If the structure is already in a 3D numpy array, one could also use:
np.amax(x,axis=2)==np.amin(x,axis=2)
Run Code Online (Sandbox Code Playgroud)
The idea for the above line is that although it would be ideal to have an equal
function with an axis
argument, there isn't one so this line notes that if amin==amax
along the axis, then all elements are equal.
If the different arrays to be compared aren't already in a 3D numpy array (or won't be in the future), looping the list is a fast and easy approach. Although I generally agree with avoiding Python loops for Numpy arrays, this seems like a case where it's easier and faster (see below) to use a Python loop since the loop is only along a single axis and it's easy to accumulate the comparisons in place. Here's a timing test:
def f0(x):
r = x[0] == x[1]
for y in x[2:]:
r &= x[0]==y
def f1(x): # from @Divakar
r = ~np.any(np.diff(np.dstack(x),axis=2),axis=2)
def f2(x):
x = np.dstack(x)
r = np.amax(x,axis=2)==np.amin(x,axis=2)
# speed test
for n, size, reps in ((1000, 3, 1000), (10, 1000, 100)):
x = [np.ones((size, size)) for i in range(n)]
print n, size, reps
print "f0: ",
print timeit("f0(x)", "from __main__ import x, f0, f1", number=reps)
print "f1: ",
print timeit("f1(x)", "from __main__ import x, f0, f1", number=reps)
print
1000 3 1000
f0: 1.14673900604 # loop
f1: 3.93413209915 # diff
f2: 3.93126702309 # min max
10 1000 100
f0: 2.42633581161 # loop
f1: 27.1066679955 # diff
f2: 25.9518558979 # min max
Run Code Online (Sandbox Code Playgroud)
If arrays are already in a single 3D numpy array (eg, from using x = np.dstack(x)
in the above) then modifying the above function defs appropriately and with the addition of the min==max
approach gives:
def g0(x):
r = x[:,:,0] == x[:,:,1]
for iy in range(x[:,:,2:].shape[2]):
r &= x[:,:,0]==x[:,:,iy]
def g1(x): # from @Divakar
r = ~np.any(np.diff(x,axis=2),axis=2)
def g2(x):
r = np.amax(x,axis=2)==np.amin(x,axis=2)
Run Code Online (Sandbox Code Playgroud)
which yields:
1000 3 1000
g0: 3.9761030674 # loop
g1: 0.0599548816681 # diff
g2: 0.0313589572906 # min max
10 1000 100
g0: 10.7617051601 # loop
g1: 10.881870985 # diff
g2: 9.66712999344 # min max
Run Code Online (Sandbox Code Playgroud)
另请注意,对于大型数组的列表f0 = 2.4
和预构建的数组g0, g1, g2 ~= 10.
,因此,如果输入数组很大,那么最快的方法大约 4 倍是将它们单独存储在列表中。我觉得这有点令人惊讶,并猜测这可能是由于缓存交换(或错误的代码?),但我不确定是否有人真正关心,所以我会在这里停止。