Converting float32 to float64 takes more than expected in numpy

Question

Converting float32 to float64 takes more than expected in numpy

I had a performance issue in a numpy project and then I realized that about 3 fourth of the execution time is wasted on a single line of code:

error = abs(detected_matrix[i, step] - original_matrix[j, new])

and when I have changed the line to

error = abs(original_matrix[j, new] - detected_matrix[i, step])

the problem has disappeared.

I relized that the type of original_matrix was float64 and type of detected_matrix was float32. By changing types of either of these two varibles the problem solved.

I was wondering that if this is a well known issue?

Here is a sample code that represents the problem

from timeit import timeit
import numpy as np

f64 = np.array([1.0], dtype='float64')[0]
f32 = np.array([1.0], dtype='float32')[0]

timeit_result = timeit(stmt="abs(f32 - f64)", number=1000000, globals=globals())
print(timeit_result)


timeit_result = timeit(stmt="abs(f64 - f32)", number=1000000, globals=globals())
print(timeit_result)

Run Code Online (Sandbox Code Playgroud)

Output in my computer:

2.8707289
0.15719420000000017

Run Code Online (Sandbox Code Playgroud)

which is quite strange.

Answer 1

Jér*_*ard 3

TL;DR：请使用 Numpy >= 1.23.0。

此问题已在 Numpy 1.23.0（更具体地说是版本 1.23.0-rc1）中修复。此拉取请求重写了标量数学逻辑，以便在许多情况下（包括您的特定用例）使其速度更快。在 1.22.4 版本中，前一个代码比第二个代码慢 10 倍。对于 1.21.5 等早期版本也是如此。在 1.23.0 中，前者仅慢 10%-15%，但两者都需要非常短的时间：140 ns/操作与 122 ns/操作。微小的差异是由于代码的类型检查部分采用的路径略有不同所致。有关此低级行为的更多信息，请阅读这篇文章。请注意，迭代 Numpy 项并不意味着非常快，也不意味着在 Numpy 标量上运行。如果您的代码受此限制，请考虑将 Numpy 标量转换为 Python 标量，如1.23.0 发行说明中所述：

NumPy 标量上的许多操作现在明显更快，尽管罕见的操作（例如使用 0 维数组而不是标量）在某些情况下可能会更慢。然而，即使有了这些改进，希望标量获得最佳性能的用户可能仍希望使用 scalar.item() 将已知的 NumPy 标量转换为 Python 标量。

在这种情况下，更快的解决方案是使用 Numba/Cython，或者在可能的情况下尝试对包围循环进行矢量化。

归档时间：	3 年前
查看次数：	446 次
最近记录：	3 年前