阅读零成本抽象并查看生锈简介:一种具有高级抽象的低级语言我试图比较两种计算向量点积的方法:一种使用for循环,一种使用迭代器.
#![feature(test)]
extern crate rand;
extern crate test;
use std::cmp::min;
fn dot_product_1(x: &[f64], y: &[f64]) -> f64 {
let mut result: f64 = 0.0;
for i in 0..min(x.len(), y.len()) {
result += x[i] * y[i];
}
return result;
}
fn dot_product_2(x: &[f64], y: &[f64]) -> f64 {
x.iter().zip(y).map(|(&a, &b)| a * b).sum::<f64>()
}
#[cfg(test)]
mod bench {
use test::Bencher;
use rand::{Rng,thread_rng};
use super::*;
const LEN: usize = 30;
#[test]
fn test_1() {
let x = [1.0, 2.0, 3.0];
let y = [2.0, 4.0, 6.0];
let result = dot_product_1(&x, &y);
assert_eq!(result, 28.0);
}
#[test]
fn test_2() {
let x = [1.0, 2.0, 3.0];
let y = [2.0, 4.0, 6.0];
let result = dot_product_2(&x, &y);
assert_eq!(result, 28.0);
}
fn rand_array(cnt: u32) -> Vec<f64> {
let mut rng = thread_rng();
(0..cnt).map(|_| rng.gen::<f64>()).collect()
}
#[bench]
fn bench_small_1(b: &mut Bencher) {
let samples = rand_array(2*LEN as u32);
b.iter(|| {
dot_product_1(&samples[0..LEN], &samples[LEN..2*LEN])
})
}
#[bench]
fn bench_small_2(b: &mut Bencher) {
let samples = rand_array(2*LEN as u32);
b.iter(|| {
dot_product_2(&samples[0..LEN], &samples[LEN..2*LEN])
})
}
}
Run Code Online (Sandbox Code Playgroud)
上面链接的后面声称带有迭代器的版本应该具有相似的性能"实际上要快一点".但是,在对两者进行基准测试时,我会得到非常不同的结果:
running 2 tests
test bench::bench_small_loop ... bench: 20 ns/iter (+/- 1)
test bench::bench_small_iter ... bench: 24 ns/iter (+/- 2)
test result: ok. 0 passed; 0 failed; 0 ignored; 2 measured; 0 filtered out
Run Code Online (Sandbox Code Playgroud)
那么,"零成本抽象"去了哪里?
更新:添加foldr@wimh提供的示例并使用split_at而不是切片给出以下结果.
running 3 tests
test bench::bench_small_fold ... bench: 18 ns/iter (+/- 1)
test bench::bench_small_iter ... bench: 21 ns/iter (+/- 1)
test bench::bench_small_loop ... bench: 24 ns/iter (+/- 1)
test result: ok. 0 passed; 0 failed; 0 ignored; 3 measured; 0 filtered out
Run Code Online (Sandbox Code Playgroud)
因此,似乎额外的时间直接或间接来自构造测量代码内的切片.为了检查确实是这种情况,我尝试了以下两种方法,结果相同(这里显示为foldrcase并使用map+ sum):
#[bench]
fn bench_small_iter(b: &mut Bencher) {
let samples = rand_array(2 * LEN);
let s0 = &samples[0..LEN];
let s1 = &samples[LEN..2 * LEN];
b.iter(|| dot_product_iter(s0, s1))
}
#[bench]
fn bench_small_fold(b: &mut Bencher) {
let samples = rand_array(2 * LEN);
let (s0, s1) = samples.split_at(LEN);
b.iter(|| dot_product_fold(s0, s1))
}
Run Code Online (Sandbox Code Playgroud)
对我来说这似乎是零成本。我以更惯用的方式编写了您的代码,对两个测试使用相同的随机值,然后进行多次测试:
fn dot_product_1(x: &[f64], y: &[f64]) -> f64 {
let mut result: f64 = 0.0;
for i in 0..min(x.len(), y.len()) {
result += x[i] * y[i];
}
result
}
fn dot_product_2(x: &[f64], y: &[f64]) -> f64 {
x.iter().zip(y).map(|(&a, &b)| a * b).sum()
}
Run Code Online (Sandbox Code Playgroud)
fn rand_array(cnt: usize) -> Vec<f64> {
let mut rng = rand::rngs::StdRng::seed_from_u64(42);
rng.sample_iter(&rand::distributions::Standard).take(cnt).collect()
}
#[bench]
fn bench_small_1(b: &mut Bencher) {
let samples = rand_array(2 * LEN);
let (s0, s1) = samples.split_at(LEN);
b.iter(|| dot_product_1(s0, s1))
}
#[bench]
fn bench_small_2(b: &mut Bencher) {
let samples = rand_array(2 * LEN);
let (s0, s1) = samples.split_at(LEN);
b.iter(|| dot_product_2(s0, s1))
}
Run Code Online (Sandbox Code Playgroud)
fn dot_product_1(x: &[f64], y: &[f64]) -> f64 {
let mut result: f64 = 0.0;
for i in 0..min(x.len(), y.len()) {
result += x[i] * y[i];
}
result
}
fn dot_product_2(x: &[f64], y: &[f64]) -> f64 {
x.iter().zip(y).map(|(&a, &b)| a * b).sum()
}
Run Code Online (Sandbox Code Playgroud)
在 10 次运行中的 9 次中,惯用代码比 for 循环更快。这是在具有 32 GB RAM 的 2.9 GHz Core i9 (I9-8950HK) 上完成的,使用rustc 1.31.0-nightly (fc403ad98 2018-09-30).