Nur*_*r L 6 python scalar performance numpy matrix-multiplication
我有一个程序,其主要性能瓶颈涉及一个维度为 1 且另一个维度较大(例如 1000)的矩阵相乘:
\nlarge_dimension = 1000\n\na = np.random.random((1,))\nb = np.random.random((1, large_dimension))\n\nc = np.matmul(a, b)\nRun Code Online (Sandbox Code Playgroud)\n换句话说,将矩阵b与标量相乘a[0]。
我正在寻找最有效的方法来计算这个,因为这个操作重复了数百万次。
\n我测试了两种简单方法的性能,它们实际上是等效的:
\n%timeit np.matmul(a, b)\n>> 1.55 \xc2\xb5s \xc2\xb1 45.8 ns per loop (mean \xc2\xb1 std. dev. of 7 runs, 1000000 loops each)\n\n%timeit a[0] * b\n>> 1.77 \xc2\xb5s \xc2\xb1 34.6 ns per loop (mean \xc2\xb1 std. dev. of 7 runs, 1000000 loops each)\nRun Code Online (Sandbox Code Playgroud)\n有没有更有效的方法来计算这个?
\n小智 6
large_dimension = 1000\n\na = np.random.random((1,))\nB = np.random.random((1, large_dimension))\n\n%timeit np.matmul(a, B)\n5.43 \xc2\xb5s \xc2\xb1 22 ns per loop (mean \xc2\xb1 std. dev. of 7 runs, 100000 loops each)\n\n%timeit a[0] * B\n5.11 \xc2\xb5s \xc2\xb1 6.92 ns per loop (mean \xc2\xb1 std. dev. of 7 runs, 100000 loops each)\nRun Code Online (Sandbox Code Playgroud)\n仅使用浮动
\n%timeit float(a[0]) * B\n3.48 \xc2\xb5s \xc2\xb1 26.1 ns per loop (mean \xc2\xb1 std. dev. of 7 runs, 100000 loops each)\nRun Code Online (Sandbox Code Playgroud)\n为了避免内存分配使用“缓冲区”
\nbuffer = np.empty_like(B)\n\n%timeit np.multiply(float(a[0]), B, buffer)\n2.96 \xc2\xb5s \xc2\xb1 37.1 ns per loop (mean \xc2\xb1 std. dev. of 7 runs, 100000 loops each)\nRun Code Online (Sandbox Code Playgroud)\n为了避免不必要的获取属性,请使用“别名”
\nmul = np.multiply\n\n%timeit mul(float(a[0]), B, buffer)\n2.73 \xc2\xb5s \xc2\xb1 12.6 ns per loop (mean \xc2\xb1 std. dev. of 7 runs, 100000 loops each)\nRun Code Online (Sandbox Code Playgroud)\n而且我根本不建议使用 numpy 标量,\n因为如果避免它,计算会更快
\na_float = float(a[0])\n\n%timeit mul(a_float, B, buffer)\n1.94 \xc2\xb5s \xc2\xb1 5.74 ns per loop (mean \xc2\xb1 std. dev. of 7 runs, 1000000 loops each)\nRun Code Online (Sandbox Code Playgroud)\n此外,如果可能的话,则在循环之外初始化缓冲区一次(当然,如果您有类似循环的东西:)
\nrng = range(1000)\n\n%%timeit\nfor i in rng:\n pass\n24.4 \xc2\xb5s \xc2\xb1 1.21 \xc2\xb5s per loop (mean \xc2\xb1 std. dev. of 7 runs, 10000 loops each)\n\n%%timeit\nfor i in rng:\n mul(a_float, B, buffer)\n1.91 ms \xc2\xb1 2.21 \xc2\xb5s per loop (mean \xc2\xb1 std. dev. of 7 runs, 1000 loops each)\nRun Code Online (Sandbox Code Playgroud)\n所以,
\n“最佳迭代时间” = (1.91 - 0.02) / 1000 => 1.89 (\xc2\xb5s)
\n“加速比” = 5.43 / 1.89 = 2.87
\n| 归档时间: |
|
| 查看次数: |
1550 次 |
| 最近记录: |