hil*_*lem 6 python numpy unique vectorization pandas
我想将“黑匣子”Python 函数f应用于大数组arr。额外的假设是:
f是“纯的”,例如是确定性的,没有副作用。arr具有少量唯一元素。我可以使用一个装饰器来实现这一点,该装饰器f为每个唯一元素计算arr如下:
import numpy as np
from time import sleep
from functools import wraps
N = 1000
np.random.seed(0)
arr = np.random.randint(0, 10, size=(N, 2))
def vectorize_pure(f):
@wraps(f)
def f_vec(arr):
uniques, ix = np.unique(arr, return_inverse=True)
f_range = np.array([f(x) for x in uniques])
return f_range[ix].reshape(arr.shape)
return f_vec
@np.vectorize
def usual_vectorize(x):
sleep(0.001)
return x
@vectorize_pure
def pure_vectorize(x):
sleep(0.001)
return x
# In [47]: %timeit usual_vectorize(arr)
# 1.33 s ± 6.16 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# In [48]: %timeit pure_vectorize(arr)
# 13.6 ms ± 81.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Run Code Online (Sandbox Code Playgroud)
我的担忧是在np.unique幕后排序arr,考虑到假设,这似乎效率低下。我正在寻找一种实用的方法来实现类似的装饰器
numpy矢量化操作。我怀疑使用 的答案是“是” numba,但我对numpy解决方案特别感兴趣。
此外,似乎取决于arr数据类型,numpy可能会使用基数排序,因此unique在某些情况下的性能可能会很好。
我在下面找到了一个解决方法,使用pandas.unique; 然而,它仍然需要对原始数组进行两次遍历,并pandas.unique做一些额外的工作。我想知道使用pandas._libs.hashtableandcython或其他任何东西是否存在更好的解决方案。
您实际上可以通过一次遍历数组来完成此操作,但是它要求您dtype事先知道结果。否则,您需要对元素进行第二次遍历来确定它。
functools.wraps暂时忽略性能(和),实现可能如下所示:
def vectorize_cached(output_dtype):
def vectorize_cached_factory(f):
def f_vec(arr):
flattened = arr.ravel()
if output_dtype is None:
result = np.empty_like(flattened)
else:
result = np.empty(arr.size, output_dtype)
cache = {}
for idx, item in enumerate(flattened):
res = cache.get(item)
if res is None:
res = f(item)
cache[item] = res
result[idx] = res
return result.reshape(arr.shape)
return f_vec
return vectorize_cached_factory
Run Code Online (Sandbox Code Playgroud)
它首先创建结果数组,然后迭代输入数组。一旦遇到字典中不存在的元素,就会调用该函数(并存储结果) - 否则它只是使用存储在字典中的值。
@vectorize_cached(np.float64)
def t(x):
print(x)
return x + 2.5
>>> t(np.array([1,1,1,2,2,2,3,3,1,1,1]))
1
2
3
array([3.5, 3.5, 3.5, 4.5, 4.5, 4.5, 5.5, 5.5, 3.5, 3.5, 3.5])
Run Code Online (Sandbox Code Playgroud)
然而,这并不是特别快,因为我们正在对 NumPy 数组进行 Python 循环。
为了让它更快,我们实际上可以将此实现移植到 Cython(目前仅支持 float32、float64、int32、int64、uint32 和 uint64,但扩展几乎是微不足道的,因为它使用了融合类型):
%%cython
cimport numpy as cnp
ctypedef fused input_type:
cnp.float32_t
cnp.float64_t
cnp.uint32_t
cnp.uint64_t
cnp.int32_t
cnp.int64_t
ctypedef fused result_type:
cnp.float32_t
cnp.float64_t
cnp.uint32_t
cnp.uint64_t
cnp.int32_t
cnp.int64_t
cpdef void vectorized_cached_impl(input_type[:] array, result_type[:] result, object func):
cdef dict cache = {}
cdef Py_ssize_t idx
cdef input_type item
for idx in range(array.size):
item = array[idx]
res = cache.get(item)
if res is None:
res = func(item)
cache[item] = res
result[idx] = res
Run Code Online (Sandbox Code Playgroud)
使用 Python 装饰器(以下代码未使用 Cython 编译):
def vectorize_cached_cython(output_dtype):
def vectorize_cached_factory(f):
def f_vec(arr):
flattened = arr.ravel()
if output_dtype is None:
result = np.empty_like(flattened)
else:
result = np.empty(arr.size, output_dtype)
vectorized_cached_impl(flattened, result, f)
return result.reshape(arr.shape)
return f_vec
return vectorize_cached_factory
Run Code Online (Sandbox Code Playgroud)
同样,这只会执行一次,并且每个唯一值仅应用一次该函数:
%%cython
cimport numpy as cnp
ctypedef fused input_type:
cnp.float32_t
cnp.float64_t
cnp.uint32_t
cnp.uint64_t
cnp.int32_t
cnp.int64_t
ctypedef fused result_type:
cnp.float32_t
cnp.float64_t
cnp.uint32_t
cnp.uint64_t
cnp.int32_t
cnp.int64_t
cpdef void vectorized_cached_impl(input_type[:] array, result_type[:] result, object func):
cdef dict cache = {}
cdef Py_ssize_t idx
cdef input_type item
for idx in range(array.size):
item = array[idx]
res = cache.get(item)
if res is None:
res = func(item)
cache[item] = res
result[idx] = res
Run Code Online (Sandbox Code Playgroud)
但问题是:在这里使用 Cython 有意义吗?
I did a quick benchmark (without sleep) to get an idea how different the performance is (using my library simple_benchmark):
def func_to_vectorize(x):
return x
usual_vectorize = np.vectorize(func_to_vectorize)
pure_vectorize = vectorize_pure(func_to_vectorize)
pandas_vectorize = vectorize_with_pandas(func_to_vectorize)
cached_vectorize = vectorize_cached(None)(func_to_vectorize)
cython_vectorize = vectorize_cached_cython(None)(func_to_vectorize)
from simple_benchmark import BenchmarkBuilder
b = BenchmarkBuilder()
b.add_function(alias='usual_vectorize')(usual_vectorize)
b.add_function(alias='pure_vectorize')(pure_vectorize)
b.add_function(alias='pandas_vectorize')(pandas_vectorize)
b.add_function(alias='cached_vectorize')(cached_vectorize)
b.add_function(alias='cython_vectorize')(cython_vectorize)
@b.add_arguments('array size')
def argument_provider():
np.random.seed(0)
for exponent in range(6, 20):
size = 2**exponent
yield size, np.random.randint(0, 10, size=(size, 2))
r = b.run()
r.plot()
Run Code Online (Sandbox Code Playgroud)
According to these times the ranking would be (fastest to slowest):
The plain NumPy solution is only a factor 5-10 slower if the function call is very inexpensive. The pandas solution also has a much bigger constant factor, making it the slowest for very small arrays.
time.sleep(0.001)), lots of duplicatesIn case the function call is actually expensive (like with time.sleep) the np.vectorize solution will be a lot slower, however there is much less difference between the other solutions:
def vectorize_cached_cython(output_dtype):
def vectorize_cached_factory(f):
def f_vec(arr):
flattened = arr.ravel()
if output_dtype is None:
result = np.empty_like(flattened)
else:
result = np.empty(arr.size, output_dtype)
vectorized_cached_impl(flattened, result, f)
return result.reshape(arr.shape)
return f_vec
return vectorize_cached_factory
Run Code Online (Sandbox Code Playgroud)
但是,如果您没有那么多重复项,则平原np.vectorize几乎与 pure 和 pandas 解决方案一样快,仅比 Cython 版本慢一点:
@vectorize_cached_cython(np.float64)
def t(x):
print(x)
return x + 2.5
>>> t(np.array([1,1,1,2,2,2,3,3,1,1,1]))
1
2
3
array([3.5, 3.5, 3.5, 4.5, 4.5, 4.5, 5.5, 5.5, 3.5, 3.5, 3.5])
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
595 次 |
| 最近记录: |