numba慢于numpy.bitwise_和布尔数组

Question

numba慢于numpy.bitwise_和布尔数组

我在这个代码片段中尝试numba

from numba import jit
import numpy as np
from time import time
db  = np.array(np.random.randint(2, size=(400e3, 4)), dtype=bool)
out = np.zeros((int(400e3), 1))

@jit()
def check_mask(db, out, mask=[1, 0, 1]):
    for idx, line in enumerate(db):
        target, vector = line[0], line[1:]
        if (mask == np.bitwise_and(mask, vector)).all():
            if target == 1:
                out[idx] = 1
    return out

st = time()
res = check_mask(db, out, [1, 0, 1])
print 'with jit: {:.4} sec'.format(time() - st)

Run Code Online (Sandbox Code Playgroud)

使用numba @jit()装饰器,这段代码运行得更慢!

没有jit:3.16秒
与jit:3.81秒

只是为了帮助更好地理解这段代码的目的:

db = np.array([           # out value for mask = [1, 0, 1]
    # target,  vector     #
      [1,      1, 0, 1],  # 1
      [0,      1, 1, 1],  # 0 (fit to mask but target == 0)
      [0,      0, 1, 0],  # 0
      [1,      1, 0, 1],  # 1
      [0,      1, 1, 0],  # 0
      [1,      0, 0, 0],  # 0
      ])

Run Code Online (Sandbox Code Playgroud)

Answer 1

小智 5

Numba 有两种编译模式jit：nopython 模式和对象模式。Nopython 模式（默认）仅支持一组有限的 Python 和 Numpy 功能，请参阅适用于您的版本的文档。如果 jitted 函数包含不受支持的代码，Numba 必须回退到对象模式，这要慢得多。

我不确定与纯 Python 相比，objcet 模式是否应该提供加速，但无论如何你总是想使用 nopython 模式。要确保使用 nopython 模式，请指定nopython=True并坚持使用非常基本的代码（经验法则：写出所有循环并仅使用标量和 Numpy 数组）：

@jit(nopython=True)
def check_mask_2(db, out, mask=np.array([1, 0, 1])):
    for idx in range(db.shape[0]):
        if db[idx,0] != 1:
            continue
        check = 1
        for j in range(db.shape[1]):
            if mask[j] and not db[idx,j+1]:
                check = 0
                break
        out[idx] = check
    return out

Run Code Online (Sandbox Code Playgroud)

显式地写出内部循环还有一个优势，即一旦条件失败，我们就可以跳出它。

时间：

%time _ = check_mask(db, out, np.array([1, 0, 1]))
# Wall time: 1.91 s
%time _ = check_mask_2(db, out, np.array([1, 0, 1]))
# Wall time: 310 ms  # slow because of compilation
%time _ = check_mask_2(db, out, np.array([1, 0, 1]))
# Wall time: 3 ms

Run Code Online (Sandbox Code Playgroud)

顺便说一句，该函数也很容易使用 Numpy 进行矢量化，这提供了不错的速度：

def check_mask_vectorized(db, mask=[1, 0, 1]):
    check = (db[:,1:] == mask).all(axis=1)
    out = (db[:,0] == 1) & check
    return out

%time _ = check_mask_vectorized(db, [1, 0, 1])
# Wall time: 14 ms

Run Code Online (Sandbox Code Playgroud)

Answer 2

ser*_*lle 5

另外，您可以尝试Pythran （免责声明：我是Pythran的开发人员）。

使用单个注解，它将编译以下代码

#pythran export check_mask(bool[][], bool[])

import numpy as np
def check_mask(db, out, mask=[1, 0, 1]):
    for idx, line in enumerate(db):
        target, vector = line[0], line[1:]
        if (mask == np.bitwise_and(mask, vector)).all():
            if target == 1:
                out[idx] = 1
    return out

Run Code Online (Sandbox Code Playgroud)

致电pythran check_call.py。

根据timeit，生成的本机模块运行速度非常快：

python -m timeit -s 'n=1e4; import numpy as np; db  = np.array(np.random.randint(2, size=(n, 4)), dtype=bool); out = np.zeros(int(n), dtype=bool); from eq import check_mask' 'check_mask(db, out)'

Run Code Online (Sandbox Code Playgroud)

告诉我CPython版本在中运行，136ms而Pythran编译版本在中运行450us。

归档时间：	10 年前
查看次数：	503 次
最近记录：	7 年，8 月前