I had a performance issue in a numpy project and then I realized that about 3 fourth of the execution time is wasted on a single line of code:
error = abs(detected_matrix[i, step] - original_matrix[j, new])
and when I have changed the line to
error = abs(original_matrix[j, new] - detected_matrix[i, step])
the problem has disappeared.
I relized that the type of original_matrix
was float64
and type of detected_matrix
was float32
. By changing types of either of these two varibles the problem solved.
I was wondering that if this is a well known issue?
Here is a sample code that represents the problem
from timeit import timeit
import numpy as np
f64 = np.array([1.0], dtype='float64')[0]
f32 = np.array([1.0], dtype='float32')[0]
timeit_result = timeit(stmt="abs(f32 - f64)", number=1000000, globals=globals())
print(timeit_result)
timeit_result = timeit(stmt="abs(f64 - f32)", number=1000000, globals=globals())
print(timeit_result)
Run Code Online (Sandbox Code Playgroud)
Output in my computer:
2.8707289
0.15719420000000017
Run Code Online (Sandbox Code Playgroud)
which is quite strange.
TL;DR:请使用 Numpy >= 1.23.0。
此问题已在 Numpy 1.23.0(更具体地说是版本 1.23.0-rc1)中修复。此拉取请求重写了标量数学逻辑,以便在许多情况下(包括您的特定用例)使其速度更快。在 1.22.4 版本中,前一个代码比第二个代码慢 10 倍。对于 1.21.5 等早期版本也是如此。在 1.23.0 中,前者仅慢 10%-15%,但两者都需要非常短的时间:140 ns/操作与 122 ns/操作。微小的差异是由于代码的类型检查部分采用的路径略有不同所致。有关此低级行为的更多信息,请阅读这篇文章。请注意,迭代 Numpy 项并不意味着非常快,也不意味着在 Numpy 标量上运行。如果您的代码受此限制,请考虑将 Numpy 标量转换为 Python 标量,如1.23.0 发行说明中所述:
NumPy 标量上的许多操作现在明显更快,尽管罕见的操作(例如使用 0 维数组而不是标量)在某些情况下可能会更慢。然而,即使有了这些改进,希望标量获得最佳性能的用户可能仍希望使用 scalar.item() 将已知的 NumPy 标量转换为 Python 标量。
在这种情况下,更快的解决方案是使用 Numba/Cython,或者在可能的情况下尝试对包围循环进行矢量化。