cpython vs cython vs numpy array performance

cru*_*rky 16 python numpy cython

我正在对http://docs.cython.org/src/tutorial/numpy.html上的素数生成器的变体进行一些性能测试.以下性能测量值为kmax = 1000

纯Python实现,在CPython中运行:0.15s

纯Python实现,在Cython中运行:0.07s

def primes(kmax):
    p = []
    k = 0
    n = 2
    while k < kmax:
        i = 0
        while i < k and n % p[i] != 0:
            i = i + 1
        if i == k:
            p.append(n)
            k = k + 1
        n = n + 1
    return p
Run Code Online (Sandbox Code Playgroud)

纯Python + Numpy实现,在CPython中运行:1.25s

import numpy

def primes(kmax):
    p = numpy.empty(kmax, dtype=int)
    k = 0
    n = 2
    while k < kmax:
        i = 0
        while i < k and n % p[i] != 0:
            i = i + 1
        if i == k:
            p[k] = n
            k = k + 1
        n = n + 1
    return p
Run Code Online (Sandbox Code Playgroud)

使用int*:0.003s的Cython实现

from libc.stdlib cimport malloc, free

def primes(int kmax):
    cdef int n, k, i
    cdef int *p = <int *>malloc(kmax * sizeof(int))
    result = []
    k = 0
    n = 2
    while k < kmax:
        i = 0
        while i < k and n % p[i] != 0:
            i = i + 1
        if i == k:
            p[k] = n
            k = k + 1
            result.append(n)
        n = n + 1
    free(p)
    return result
Run Code Online (Sandbox Code Playgroud)

以上表现很好,但看起来很可怕,因为它拥有两份数据...所以我尝试重新实现它:

Cython + Numpy:1.01s

import numpy as np
cimport numpy as np
cimport cython

DTYPE = np.int
ctypedef np.int_t DTYPE_t

@cython.boundscheck(False)
def primes(DTYPE_t kmax):
    cdef DTYPE_t n, k, i
    cdef np.ndarray p = np.empty(kmax, dtype=DTYPE)
    k = 0
    n = 2
    while k < kmax:
        i = 0
        while i < k and n % p[i] != 0:
            i = i + 1
        if i == k:
            p[k] = n
            k = k + 1
        n = n + 1
    return p
Run Code Online (Sandbox Code Playgroud)

问题:

  1. 为什么在CPython上运行时numpy数组比python列表慢得多?
  2. 我在Cython + Numpy实现中做错了什么?cython显然不会将numpy数组视为int [].
  3. 如何将numpy数组转换为int*?以下不起作用

    cdef numpy.nparray a = numpy.zeros(100, dtype=int)
    cdef int * p = <int *>a.data
    
    Run Code Online (Sandbox Code Playgroud)

M4r*_*ini 9

cdef DTYPE_t [:] p_view = p
Run Code Online (Sandbox Code Playgroud)

在计算中使用此代替p.我将运行时间从580毫秒减少到2.8毫秒.关于与使用*int的实现完全相同的运行时.这就是你可以期待的最大值.

DTYPE = np.int
ctypedef np.int_t DTYPE_t

@cython.boundscheck(False)
def primes(DTYPE_t kmax):
    cdef DTYPE_t n, k, i
    cdef np.ndarray p = np.empty(kmax, dtype=DTYPE)
    cdef DTYPE_t [:] p_view = p
    k = 0
    n = 2
    while k < kmax:
        i = 0
        while i < k and n % p_view[i] != 0:
            i = i + 1
        if i == k:
            p_view[k] = n
            k = k + 1
        n = n + 1
    return p
Run Code Online (Sandbox Code Playgroud)


Fre*_*Foo 5

为什么在CPython上运行时numpy数组比python列表慢得多?

因为你没有完全输入它.使用

cdef np.ndarray[dtype=np.int, ndim=1] p = np.empty(kmax, dtype=DTYPE)
Run Code Online (Sandbox Code Playgroud)

如何将numpy数组转换为int*?

通过使用np.intc作为dtype,不是np.int(这是C long).那是

cdef np.ndarray[dtype=int, ndim=1] p = np.empty(kmax, dtype=np.intc)
Run Code Online (Sandbox Code Playgroud)

(但实际上,使用memoryview,它们更干净,从长远来看,Cython人们想要摆脱NumPy数组语法.)


cru*_*rky 1

迄今为止我发现的最佳语法:

import numpy
cimport numpy
cimport cython

@cython.boundscheck(False)
@cython.wraparound(False)
def primes(int kmax):
    cdef int n, k, i
    cdef numpy.ndarray[int] p = numpy.empty(kmax, dtype=numpy.int32)
    k = 0
    n = 2
    while k < kmax:
        i = 0
        while i < k and n % p[i] != 0:
            i = i + 1
        if i == k:
            p[k] = n
            k = k + 1
        n = n + 1
    return p
Run Code Online (Sandbox Code Playgroud)

请注意我在哪里使用了 numpy.int32 而不是 int。cdef 左侧的任何内容都是 C 类型(因此 int = int32 和 float = float32),而其右侧(或 cdef 之外)的任何内容都是 python 类型(int = int64 和 float = float64 )