优化我的Cython/Numpy代码?到目前为止只有30%的性能提升

Chi*_*nex 3 python numpy cython

为了加快速度,有什么我忘了做的吗?我正在尝试实现一本名为Tuning Timbre Spectrum Scale的书中描述的算法.另外---如果所有其他方法都失败了,有没有办法让我在C中编写这部分代码,然后能够从python中调用它?

import numpy as np
cimport numpy as np

# DTYPE = np.float
ctypedef np.float_t DTYPE_t

np.seterr(divide='raise', over='raise', under='ignore', invalid='raise')

"""
I define a timbre as the following 2d numpy array:
[[f0, a0], [f1, a1], [f2, a2]...] where f describes the frequency
of the given partial and a is its amplitude from 0 to 1. Phase is ignored.
"""

#Test Timbre
# cdef np.ndarray[DTYPE_t,ndim=2] t1 = np.array( [[440,1],[880,.5],[(440*3),.333]])

# Calculates the inherent dissonance of one timbres of the above form
# using the diss2Partials function
cdef DTYPE_t diss1Timbre(np.ndarray[DTYPE_t,ndim=2] t):
    cdef DTYPE_t runningDiss1
    runningDiss1 = 0.0
    cdef unsigned int len = np.shape(t)[0]
    cdef unsigned int i
    cdef unsigned int j
    for i from 0 <= i < len:
        for j from i+1 <= j < len:
            runningDiss1 += diss2Partials(t[i], t[j])
    return runningDiss1

# Calculates the dissonance between two timbres of the above form 
cdef DTYPE_t diss2Timbres(np.ndarray[DTYPE_t,ndim=2] t1, np.ndarray[DTYPE_t,ndim=2] t2):
    cdef DTYPE_t runningDiss2
    runningDiss2 = 0.0
    cdef unsigned int len1 = np.shape(t1)[0]
    cdef unsigned int len2 = np.shape(t2)[0]
    runningDiss2 += diss1Timbre(t1)
    runningDiss2 += diss1Timbre(t2)
    cdef unsigned int i1
    cdef unsigned int i2
    for i1 from 0 <= i1 < len1:
        for i2 from 0 <= i2 < len2:
            runningDiss2 += diss2Partials(t1[i1], t2[i2])
    return runningDiss2

cdef inline DTYPE_t float_min(DTYPE_t a, DTYPE_t b): return a if a <= b else b

# Calculates the dissonance of two partials of the form [f,a]
cdef DTYPE_t diss2Partials(np.ndarray[DTYPE_t,ndim=1] p1, np.ndarray[DTYPE_t,ndim=1] p2):
    cdef DTYPE_t f1 = p1[0]
    cdef DTYPE_t f2 = p2[0]
    cdef DTYPE_t a1 = abs(p1[1])
    cdef DTYPE_t a2 = abs(p2[1])

    # In order to insure that f2 > f1:
    if (f2 < f1):
        (f1,f2,a1,a2) = (f2,f1,a2,a1)

    # Constants of the dissonance curves
    cdef DTYPE_t _xStar
    _xStar = 0.24
    cdef DTYPE_t _s1
    _s1 = 0.021
    cdef DTYPE_t _s2
    _s2 = 19
    cdef DTYPE_t _b1
    _b1 = 3.5
    cdef DTYPE_t _b2
    _b2 = 5.75

    cdef DTYPE_t a = float_min(a1,a2)
    cdef DTYPE_t s = _xStar/(_s1*f1 + _s2)
    return (a * (np.exp(-_b1*s*(f2-f1)) - np.exp(-_b2*s*(f2-f1)) ) )

cpdef dissTimbreScale(np.ndarray[DTYPE_t,ndim=2] t,np.ndarray[DTYPE_t,ndim=1] s):
    cdef DTYPE_t currDiss
    currDiss = 0.0;
    cdef unsigned int i
    for i from 0 <= i < s.size:
        currDiss += diss2Timbres(t, transpose(t,s[i]))
    return currDiss

cdef np.ndarray[DTYPE_t,ndim=2] transpose(np.ndarray[DTYPE_t,ndim=2] t, DTYPE_t ratio):
    return np.dot(t, np.array([[ratio,0],[0,1]]))
Run Code Online (Sandbox Code Playgroud)

链接到代码:Cython代码

Jus*_*eel 9

以下是我注意到的一些事情:

  1. 在其他地方使用t1.shape[0]而不是np.shape(t1)[0]等等.
  2. 不要len用作变量,因为它是Python中的内置函数(不是为了速度,而是为了良好的实践).使用L或类似的东西.
  3. 除非确实需要,否则不要将两元素数组传递给函数.每次传递数组时,Cython都会检查缓冲区.所以,当使用diss2Partials(t[i], t[j])do diss2Partials(t[i,0], t[i,1], t[j,0], t[j,1])而是diss2Partials适当地重新定义时.
  4. 不要使用abs,或者至少不使用Python.它必须将你的C double转换为Python float,调用abs函数,然后转换回C double.像你一样制作内联函数可能会更好float_min.
  5. 调用np.exp正在使用类似的东西abs.更改np.expexp并添加from libc.math cimport exp顶部到您的进口.
  6. transpose彻底摆脱这个功能.这np.dot实际上减慢了速度,但无论如何都不需要矩阵乘法.重写你的dissTimbreScale函数来创建一个空矩阵,比方说t2.在当前循环之前,将第二列设置t2为等于第二列t(最好使用循环,但是你可以在这里使用Numpy操作).然后,在当前循环内部,放入一个循环,将第一列设置为t2等于第一列t时间s[i].这就是你的矩阵乘法真正做的事情.然后只传递t2第二个参数diss2Timbres而不是transpose函数返回的参数.

先做1-5,因为它们很容易.6号可能需要更多的时间,精力和实验,但我怀疑它也可能会给你带来显着的速度提升.