NoE*_*xiT 10 python numpy cython pyopencl numexpr
该数学函数的目的是使用二面角计算两个(或更多)蛋白质结构之间的距离:
例如,它在结构生物学中非常有用.我已经使用numpy在python中编写了这个函数,但目标是实现更快.作为计算时间参考,我使用scikit-learn包中提供的欧几里德距离函数.
这是我目前的代码:
import numpy as np
import numexpr as ne
from sklearn.metrics.pairwise import euclidean_distances
# We have 10000 structures with 100 dihedral angles
n = 10000
m = 100
# Generate some random data
c = np.random.rand(n,m)
# Generate random int number
x = np.random.randint(c.shape[0])
print c.shape, x
# First version with numpy of the dihedral_distances function
def dihedral_distances(a, b):
l = 1./a.shape[0]
return np.sqrt(l* np.sum((0.5)*(1. - np.cos(a-b)), axis=1))
# Accelerated version with numexpr
def dihedral_distances_ne(a, b):
l = 1./a.shape[0]
tmp = ne.evaluate('sum((0.5)*(1. - cos(a-b)), axis=1)')
return ne.evaluate('sqrt(l* tmp)')
# The function of reference I try to be close as possible
# in term of computation time
%timeit euclidean_distances(c[x,:], c)[0]
1000 loops, best of 3: 1.07 ms per loop
# Computation time of the first version of the dihedral_distances function
# We choose randomly 1 structure among the 10000 structures.
# And we compute the dihedral distance between this one and the others
%timeit dihedral_distances(c[x,:], c)
10 loops, best of 3: 21.5 ms per loop
# Computation time of the accelerated function with numexpr
%timeit dihedral_distances_ne(c[x,:], c)
100 loops, best of 3: 9.44 ms per loop
Run Code Online (Sandbox Code Playgroud)
9.44毫秒它非常快,但是如果你需要运行它一百万次它会非常慢.现在问题是,怎么做?你下一步怎么做?用Cython?PyOpenCL?我对PyOpenCL有一些经验,但是我从来没有像这个那样精心编写代码.我不知道是否有可能在GPU上一步计算二面体距离,就像我使用numpy以及如何进行一样.
感谢你们对我的帮助!
编辑:谢谢你们!我目前正在研究完整的解决方案,一旦完成,我将把代码放在这里.
CYTHON版本:
%load_ext cython
import numpy as np
np.random.seed(1234)
n = 10000
m = 100
c = np.random.rand(n,m)
x = np.random.randint(c.shape[0])
print c.shape, x
%%cython --compile-args=-fopenmp --link-args=-fopenmp --force
import numpy as np
cimport numpy as np
from libc.math cimport sqrt, cos
cimport cython
from cython.parallel cimport parallel, prange
# Define a function pointer to a metric
ctypedef double (*metric)(double[: ,::1], np.intp_t, np.intp_t)
cdef extern from "math.h" nogil:
double cos(double x)
double sqrt(double x)
@cython.boundscheck(False)
@cython.wraparound(False)
@cython.cdivision(True)
cdef double dihedral_distances(double[:, ::1] a, np.intp_t i1, np.intp_t i2):
cdef double res
cdef int m
cdef int j
res = 0.
m = a.shape[1]
for j in range(m):
res += 1. - cos(a[i1, j] - a[i2, j])
res /= 2.*m
return sqrt(res)
@cython.boundscheck(False)
@cython.wraparound(False)
@cython.cdivision(True)
cdef double dihedral_distances_p(double[:, ::1] a, np.intp_t i1, np.intp_t i2):
cdef double res
cdef int m
cdef int j
res = 0.
m = a.shape[1]
with nogil, parallel(num_threads=2):
for j in prange(m, schedule='dynamic'):
res += 1. - cos(a[i1, j] - a[i2, j])
res /= 2.*m
return sqrt(res)
@cython.boundscheck(False)
@cython.wraparound(False)
def pairwise(double[: ,::1] c not None, np.intp_t x, p = True):
cdef metric dist_func
if p:
dist_func = &dihedral_distances_p
else:
dist_func = &dihedral_distances
cdef np.intp_t i, n_structures
n_samples = c.shape[0]
cdef double[::1] res = np.empty(n_samples)
for i in range(n_samples):
res[i] = dist_func(c, x, i)
return res
%timeit pairwise(c, x, False)
100 loops, best of 3: 17 ms per loop
# Parallel version
%timeit pairwise(c, x, True)
10 loops, best of 3: 37.1 ms per loop
Run Code Online (Sandbox Code Playgroud)
所以我按照你的链接创建了双面距离函数的cython版本.我们获得了一些速度,而不是那么多,但它仍然比numexpr版本慢(17ms vs 9.44ms).所以我尝试使用prange并行化函数,它更糟糕(37.1ms vs 17ms vs 9.4ms)!
我错过了什么吗?
如果您愿意使用http://pythran.readthedocs.io/,您可以利用 numpy 实现并在这种情况下获得比 cython 更好的性能:
\n\n#pythran export np_cos_norm(float[], float[])\nimport numpy as np\ndef np_cos_norm(a, b):\n val = np.sum(1. - np.cos(a-b))\n return np.sqrt(val / 2. / a.shape[0])\nRun Code Online (Sandbox Code Playgroud)\n\n并用以下命令编译它:
\n\npythran fast.py\nRun Code Online (Sandbox Code Playgroud)\n\n获得比 cython 版本平均 x2 的结果。
\n\n如果使用:
\n\npythran fast.py -march=native -DUSE_BOOST_SIMD -fopenmp\nRun Code Online (Sandbox Code Playgroud)\n\n您将获得运行速度稍快的矢量化并行版本:
\n\n100000 loops, best of 3: 2.54 \xc2\xb5s per loop\n1000000 loops, best of 3: 674 ns per loop\n\n100000 loops, best of 3: 16.9 \xc2\xb5s per loop\n100000 loops, best of 3: 4.31 \xc2\xb5s per loop\n\n10000 loops, best of 3: 176 \xc2\xb5s per loop\n10000 loops, best of 3: 42.9 \xc2\xb5s per loop\nRun Code Online (Sandbox Code Playgroud)\n\n(使用与 ev-br 相同的测试平台)
\n| 归档时间: |
|
| 查看次数: |
761 次 |
| 最近记录: |