如何对Vec或切片进行部分排序?

Tob*_*ann 2 sorting performance vector rust partial-sort

我需要从一个Vec相当大的生产中获得前N个项目.目前我这样做效率很低:

let mut v = vec![6, 4, 3, 7, 2, 1, 5];
v.sort_unstable();
v = v[0..3].to_vec();
Run Code Online (Sandbox Code Playgroud)

在C++中,我会使用std::partial_sort,但我在Rust文档中找不到相应的东西.

我只是忽略它,还是不存在(还)?

小智 7

有一个select_nth_unstable,相当于std::nth_element。然后可以对结果进行排序以达到您想要的效果。

例子:

let mut v = vec![6, 4, 3, 7, 2, 1, 5];
let top_three = v.select_nth_unstable(3).0;
top_three.sort();
Run Code Online (Sandbox Code Playgroud)

3这是“第 n”个元素的索引,所以我们实际上选择了第 4 个元素,这是因为select_nth_unstable返回一个元组

  • 第 n 个元素左侧的切片
  • 对第 n 个元素的引用
  • 第 n 个元素右侧的切片


Sta*_*eur 6

标准库不包含此功能,但看起来lazysort箱子正是您所需要的:

那么懒惰排序有什么意义呢?根据链接的博客文章,当您不需要或打算需要每个值时,它们都很有用; 例如,您可能只需要较大集合中的前1,000个有序值.

#![feature(test)]

extern crate lazysort;
extern crate rand;
extern crate test;

use std::cmp::Ordering;

trait SortLazy<T> {
    fn sort_lazy<F>(&mut self, cmp: F, n: usize)
    where
        F: Fn(&T, &T) -> Ordering;
    unsafe fn sort_lazy_fast<F>(&mut self, cmp: F, n: usize)
    where
        F: Fn(&T, &T) -> Ordering;
}

impl<T> SortLazy<T> for [T] {
    fn sort_lazy<F>(&mut self, cmp: F, n: usize)
    where
        F: Fn(&T, &T) -> Ordering,
    {
        fn sort_lazy<F, T>(data: &mut [T], accu: &mut usize, cmp: &F, n: usize)
        where
            F: Fn(&T, &T) -> Ordering,
        {
            if !data.is_empty() && *accu < n {
                let mut pivot = 1;
                let mut lower = 0;
                let mut upper = data.len();
                while pivot < upper {
                    match cmp(&data[pivot], &data[lower]) {
                        Ordering::Less => {
                            data.swap(pivot, lower);
                            lower += 1;
                            pivot += 1;
                        }
                        Ordering::Greater => {
                            upper -= 1;
                            data.swap(pivot, upper);
                        }
                        Ordering::Equal => pivot += 1,
                    }
                }
                sort_lazy(&mut data[..lower], accu, cmp, n);
                sort_lazy(&mut data[upper..], accu, cmp, n);
            } else {
                *accu += 1;
            }
        }
        sort_lazy(self, &mut 0, &cmp, n);
    }

    unsafe fn sort_lazy_fast<F>(&mut self, cmp: F, n: usize)
    where
        F: Fn(&T, &T) -> Ordering,
    {
        fn sort_lazy<F, T>(data: &mut [T], accu: &mut usize, cmp: &F, n: usize)
        where
            F: Fn(&T, &T) -> Ordering,
        {
            if !data.is_empty() && *accu < n {
                unsafe {
                    use std::mem::swap;
                    let mut pivot = 1;
                    let mut lower = 0;
                    let mut upper = data.len();
                    while pivot < upper {
                        match cmp(data.get_unchecked(pivot), data.get_unchecked(lower)) {
                            Ordering::Less => {
                                swap(
                                    &mut *(data.get_unchecked_mut(pivot) as *mut T),
                                    &mut *(data.get_unchecked_mut(lower) as *mut T),
                                );
                                lower += 1;
                                pivot += 1;
                            }
                            Ordering::Greater => {
                                upper -= 1;
                                swap(
                                    &mut *(data.get_unchecked_mut(pivot) as *mut T),
                                    &mut *(data.get_unchecked_mut(upper) as *mut T),
                                );
                            }
                            Ordering::Equal => pivot += 1,
                        }
                    }
                    sort_lazy(&mut data[..lower], accu, cmp, n);
                    sort_lazy(&mut data[upper..], accu, cmp, n);
                }
            } else {
                *accu += 1;
            }
        }
        sort_lazy(self, &mut 0, &cmp, n);
    }
}

#[cfg(test)]
mod tests {
    use test::Bencher;

    use lazysort::Sorted;
    use std::collections::BinaryHeap;
    use SortLazy;

    use rand::{thread_rng, Rng};

    const SIZE_VEC: usize = 100_000;
    const N: usize = 42;

    #[bench]
    fn sort(b: &mut Bencher) {
        b.iter(|| {
            let mut rng = thread_rng();
            let mut v: Vec<i32> = std::iter::repeat_with(|| rng.gen())
                .take(SIZE_VEC)
                .collect();
            v.sort_unstable();
        })
    }

    #[bench]
    fn lazysort(b: &mut Bencher) {
        b.iter(|| {
            let mut rng = thread_rng();
            let v: Vec<i32> = std::iter::repeat_with(|| rng.gen())
                .take(SIZE_VEC)
                .collect();
            let _: Vec<_> = v.iter().sorted().take(N).collect();
        })
    }

    #[bench]
    fn lazysort_in_place(b: &mut Bencher) {
        b.iter(|| {
            let mut rng = thread_rng();
            let mut v: Vec<i32> = std::iter::repeat_with(|| rng.gen())
                .take(SIZE_VEC)
                .collect();
            v.sort_lazy(i32::cmp, N);
        })
    }

    #[bench]
    fn lazysort_in_place_fast(b: &mut Bencher) {
        b.iter(|| {
            let mut rng = thread_rng();
            let mut v: Vec<i32> = std::iter::repeat_with(|| rng.gen())
                .take(SIZE_VEC)
                .collect();
            unsafe { v.sort_lazy_fast(i32::cmp, N) };
        })
    }

    #[bench]
    fn binaryheap(b: &mut Bencher) {
        b.iter(|| {
            let mut rng = thread_rng();
            let v: Vec<i32> = std::iter::repeat_with(|| rng.gen())
                .take(SIZE_VEC)
                .collect();

            let mut iter = v.iter();
            let mut heap: BinaryHeap<_> = iter.by_ref().take(N).collect();
            for i in iter {
                heap.push(i);
                heap.pop();
            }
            let _ = heap.into_sorted_vec();
        })
    }
}
Run Code Online (Sandbox Code Playgroud)
running 5 tests
test tests::binaryheap             ... bench:   3,283,938 ns/iter (+/- 413,805)
test tests::lazysort               ... bench:   1,669,229 ns/iter (+/- 505,528)
test tests::lazysort_in_place      ... bench:   1,781,007 ns/iter (+/- 443,472)
test tests::lazysort_in_place_fast ... bench:   1,652,103 ns/iter (+/- 691,847)
test tests::sort                   ... bench:   5,600,513 ns/iter (+/- 711,927)

test result: ok. 0 passed; 0 failed; 0 ignored; 5 measured; 0 filtered out
Run Code Online (Sandbox Code Playgroud)

这段代码让我们看到它lazysort比解决方案更快BinaryHeap.我们还可以看到BinaryHeap解决方案在N增加时变得更糟.

问题lazysort在于它创造了第二个Vec<_>."更好"的解决方案是实施部分排序.我提供了这样一个实现的例子.

请记住,所有这些解决方案都带有开销.当NSIZE_VEC / 3,经典的sort胜利.

您可以提交RFC /问题,询问是否将此功能添加到标准库中.