使用NumPy进行快速张量旋转

And*_*ker 43 python optimization numpy rotation scipy

应用程序的核心(用Python编写并使用NumPy)我需要旋转4阶张量.实际上,我需要多次旋转很多张量,这是我的瓶颈.我的天真实现(下面)涉及八个嵌套循环似乎相当慢,但我看不到一种方法来利用NumPy的矩阵运算,并希望加快速度.我有一种感觉,我应该使用np.tensordot,但我不知道如何.

在数学上,旋转张量,T的元素"由下式给出:T" IJKL =Σ克IAJBKCLD Ť ABCD与和被过在右手侧上的重复指数.T和Tprime是3*3*3*3个NumPy阵列,旋转矩阵g是3*3 NumPy阵列.我执行缓慢(每次通话约0.04秒)如下.

#!/usr/bin/env python

import numpy as np

def rotT(T, g):
    Tprime = np.zeros((3,3,3,3))
    for i in range(3):
        for j in range(3):
            for k in range(3):
                for l in range(3):
                    for ii in range(3):
                        for jj in range(3):
                            for kk in range(3):
                                for ll in range(3):
                                    gg = g[ii,i]*g[jj,j]*g[kk,k]*g[ll,l]
                                    Tprime[i,j,k,l] = Tprime[i,j,k,l] + \
                                         gg*T[ii,jj,kk,ll]
    return Tprime

if __name__ == "__main__":

    T = np.array([[[[  4.66533067e+01,  5.84985000e-02, -5.37671310e-01],
                    [  5.84985000e-02,  1.56722231e+01,  2.32831900e-02],
                    [ -5.37671310e-01,  2.32831900e-02,  1.33399259e+01]],
                   [[  4.60051700e-02,  1.54658176e+01,  2.19568200e-02],
                    [  1.54658176e+01, -5.18223500e-02, -1.52814920e-01],
                    [  2.19568200e-02, -1.52814920e-01, -2.43874100e-02]],
                   [[ -5.35577630e-01,  1.95558600e-02,  1.31108757e+01],
                    [  1.95558600e-02, -1.51342210e-01, -6.67615000e-03],
                    [  1.31108757e+01, -6.67615000e-03,  6.90486240e-01]]],
                  [[[  4.60051700e-02,  1.54658176e+01,  2.19568200e-02],
                    [  1.54658176e+01, -5.18223500e-02, -1.52814920e-01],
                    [  2.19568200e-02, -1.52814920e-01, -2.43874100e-02]],
                   [[  1.57414726e+01, -3.86167500e-02, -1.55971950e-01],
                    [ -3.86167500e-02,  4.65601977e+01, -3.57741000e-02],
                    [ -1.55971950e-01, -3.57741000e-02,  1.34215636e+01]],
                   [[  2.58256300e-02, -1.49072770e-01, -7.38843000e-03],
                    [ -1.49072770e-01, -3.63410500e-02,  1.32039847e+01],
                    [ -7.38843000e-03,  1.32039847e+01,  1.38172700e-02]]],
                  [[[ -5.35577630e-01,  1.95558600e-02,  1.31108757e+01],
                    [  1.95558600e-02, -1.51342210e-01, -6.67615000e-03],
                    [  1.31108757e+01, -6.67615000e-03,  6.90486240e-01]],
                   [[  2.58256300e-02, -1.49072770e-01, -7.38843000e-03],
                    [ -1.49072770e-01, -3.63410500e-02,  1.32039847e+01],
                    [ -7.38843000e-03,  1.32039847e+01,  1.38172700e-02]],
                   [[  1.33639532e+01, -1.26331100e-02,  6.84650400e-01],
                    [ -1.26331100e-02,  1.34222177e+01,  1.67851800e-02],
                    [  6.84650400e-01,  1.67851800e-02,  4.89151396e+01]]]])

    g = np.array([[ 0.79389393,  0.54184237,  0.27593346],
                  [-0.59925749,  0.62028664,  0.50609776],
                  [ 0.10306737, -0.56714313,  0.8171449 ]])

    for i in range(100):
        Tprime = rotT(T,g)
Run Code Online (Sandbox Code Playgroud)

有没有办法让这个更快?使代码推广到其他张量等级将是有用的,但不太重要.

Phi*_*ipp 41

要使用tensordot,请计算g张量的外积:

def rotT(T, g):
    gg = np.outer(g, g)
    gggg = np.outer(gg, gg).reshape(4 * g.shape)
    axes = ((0, 2, 4, 6), (0, 1, 2, 3))
    return np.tensordot(gggg, T, axes)
Run Code Online (Sandbox Code Playgroud)

在我的系统上,这比Sven的解决方案快七倍.如果g张量不经常改变,你也可以缓存gggg张量.如果你这样做并打开一些微优化(内联tensordot代码,没有检查,没有通用形状),你仍然可以使它快两倍:

def rotT(T, gggg):
    return np.dot(gggg.transpose((1, 3, 5, 7, 0, 2, 4, 6)).reshape((81, 81)),
                  T.reshape(81, 1)).reshape((3, 3, 3, 3))
Run Code Online (Sandbox Code Playgroud)

timeit在我的家用笔记本电脑上的结果(500次迭代):

Your original code: 19.471129179
Sven's code: 0.718412876129
My first code: 0.118047952652
My second code: 0.0690279006958
Run Code Online (Sandbox Code Playgroud)

我的工作机器上的数字是:

Your original code: 9.77922987938
Sven's code: 0.137110948563
My first code: 0.0569641590118
My second code: 0.0308079719543
Run Code Online (Sandbox Code Playgroud)


Sve*_*ach 34

以下是使用单个Python循环的方法:

def rotT(T, g):
    Tprime = T
    for i in range(4):
        slices = [None] * 4
        slices[i] = slice(None)
        slices *= 2
        Tprime = g[slices].T * Tprime
    return Tprime.sum(-1).sum(-1).sum(-1).sum(-1)
Run Code Online (Sandbox Code Playgroud)

不可否认,乍一看这有点难以掌握,但它的速度要快得多:)

  • 数学精彩的+1和另一个证明算法优化以数量级击败微优化的证据. (5认同)
  • 相当不错!+1沿着这些方向,有一个函数可能包含在numpy的未来版本中,这也适合OP的目的......`numpy.einsum`(它还没有).请参阅关于numpy讨论的讨论:http://www.mail-archive.com/numpy-discussion@scipy.org/msg29680.html (2认同)

pv.*_*pv. 19

感谢M. Wiebe的辛勤工作,Numpy的下一个版本(可能是1.6)将使这更容易:

>>> Trot = np.einsum('ai,bj,ck,dl,abcd->ijkl', g, g, g, g, T)
Run Code Online (Sandbox Code Playgroud)

不过,Philipp的方法目前要快3倍,但也许还有一些改进空间.速度差异可能主要是因为tensordot能够将整个操作展开为可以传递给BLAS的单个矩阵产品,因此避免了与小阵列相关的大量开销 - 这对于一般的爱因斯坦来说是不可能的求和,因为并非所有可以用这种形式表达的操作都解析为单个矩阵产品.


jfs*_*jfs 10

出于好奇,我将来自问题的天真代码的Cython实现与来自@ Philipp的答案的numpy代码进行了比较.Cython代码在我的机器上四倍:

#cython: boundscheck=False, wraparound=False
import numpy as np
cimport numpy as np

def rotT(np.ndarray[np.float64_t, ndim=4] T,
         np.ndarray[np.float64_t, ndim=2] g):
    cdef np.ndarray[np.float64_t, ndim=4] Tprime
    cdef Py_ssize_t i, j, k, l, ii, jj, kk, ll
    cdef np.float64_t gg

    Tprime = np.zeros((3,3,3,3), dtype=T.dtype)
    for i in range(3):
        for j in range(3):
            for k in range(3):
                for l in range(3):
                    for ii in range(3):
                        for jj in range(3):
                            for kk in range(3):
                                for ll in range(3):
                                    gg = g[ii,i]*g[jj,j]*g[kk,k]*g[ll,l]
                                    Tprime[i,j,k,l] = Tprime[i,j,k,l] + \
                                         gg*T[ii,jj,kk,ll]
    return Tprime
Run Code Online (Sandbox Code Playgroud)