我正在尝试编写一个与numpy.sum双精度数组一样快的 C 程序,但似乎失败了。
以下是我衡量 numpy 性能的方法:
import numpy as np
import time
SIZE=4000000
REPS=5
xs = np.random.rand(SIZE)
print(xs.dtype)
for _ in range(REPS):
start = time.perf_counter()
r = np.sum(xs)
end = time.perf_counter()
print(f"{SIZE / (end-start) / 10**6:.2f} MFLOPS ({r:.2f})")
Run Code Online (Sandbox Code Playgroud)
输出是:
float64
2941.61 MFLOPS (2000279.78)
3083.56 MFLOPS (2000279.78)
3406.18 MFLOPS (2000279.78)
3712.33 MFLOPS (2000279.78)
3661.15 MFLOPS (2000279.78)
Run Code Online (Sandbox Code Playgroud)
现在尝试在 C 中做类似的事情:
float64
2941.61 MFLOPS (2000279.78)
3083.56 MFLOPS (2000279.78)
3406.18 MFLOPS (2000279.78)
3712.33 MFLOPS (2000279.78)
3661.15 MFLOPS (2000279.78)
Run Code Online (Sandbox Code Playgroud)
编译并gcc -o main …