由于我np.dot被OpenBlas和Openmpi加速,我想知道是否有可能写出双倍数额
for i in range(N):
for j in range(N):
B[k,l] += A[i,j,k,l] * X[i,j]
Run Code Online (Sandbox Code Playgroud)
作为内在产品.就在我正在使用的那一刻
B = np.einsum("ijkl,ij->kl",A,X)
Run Code Online (Sandbox Code Playgroud)
但不幸的是它很慢,只使用一个处理器.有任何想法吗?
编辑:我用一个简单的例子对给出的答案进行了基准测试,似乎它们都处于同一数量级:
A = np.random.random([200,200,100,100])
X = np.random.random([200,200])
def B1():
return es("ijkl,ij->kl",A,X)
def B2():
return np.tensordot(A, X, [[0,1], [0, 1]])
def B3():
shp = A.shape
return np.dot(X.ravel(),A.reshape(shp[0]*shp[1],1)).reshape(shp[2],shp[3])
%timeit B1()
%timeit B2()
%timeit B3()
1 loops, best of 3: 300 ms per loop
10 loops, best of 3: 149 ms per loop
10 loops, best of 3: 150 ms per loop …Run Code Online (Sandbox Code Playgroud) 为什么以下L2范数计算之间的速度差异如此之大:
a = np.arange(1200.0).reshape((-1,3))
%timeit [np.sqrt((a*a).sum(axis=1))]
100000 loops, best of 3: 12 µs per loop
%timeit [np.sqrt(np.dot(x,x)) for x in a]
1000 loops, best of 3: 814 µs per loop
%timeit [np.linalg.norm(x) for x in a]
100 loops, best of 3: 2 ms per loop
Run Code Online (Sandbox Code Playgroud)
据我所知,这三个结果均相同。
这是numpy.linalg.norm函数的源代码:
x = asarray(x)
# Check the default case first and handle it immediately.
if ord is None and axis is None:
x = x.ravel(order='K')
if isComplexType(x.dtype.type):
sqnorm = dot(x.real, x.real) + dot(x.imag, x.imag) …Run Code Online (Sandbox Code Playgroud)