是否有零拷贝方法来查找任意数量的集合的交集？

Question

是否有零拷贝方法来查找任意数量的集合的交集？

这是一个简单的例子，展示了我正在尝试做的事情：

use std::collections::HashSet;

fn main() {
    let mut sets: Vec<HashSet<char>> = vec![];

    let mut set = HashSet::new();
    set.insert('a');
    set.insert('b');
    set.insert('c');
    set.insert('d');
    sets.push(set);

    let mut set = HashSet::new();
    set.insert('a');
    set.insert('b');
    set.insert('d');
    set.insert('e');
    sets.push(set);

    let mut set = HashSet::new();
    set.insert('a');
    set.insert('b');
    set.insert('f');
    set.insert('g');
    sets.push(set);

    // Simple intersection of two sets
    let simple_intersection = sets[0].intersection(&sets[1]);
    println!("Intersection of 0 and 1: {:?}", simple_intersection);

    let mut iter = sets.iter();
    let base = iter.next().unwrap().clone();
    let intersection = iter.fold(base, |acc, set| acc.intersection(set).map(|x| x.clone()).collect());
    println!("Intersection of all: {:?}", intersection);
}

Run Code Online (Sandbox Code Playgroud)

该解决方案使用折叠来“累积”交集，使用第一个元素作为初始值。

Intersections 是惰性迭代器，它通过对所涉及集合的引用进行迭代。由于累加器必须与第一个元素具有相同的类型，因此我们必须克隆每个集合的元素。如果不进行克隆，我们就无法从引用中创建一组拥有的数据。我想我明白这一点。

例如，这不起作用：

let mut iter = sets.iter();
let mut base = iter.next().unwrap();
let intersection = iter.fold(base, |acc, set| acc.intersection(set).collect());
println!("Intersection of all: {:?}", intersection);

Run Code Online (Sandbox Code Playgroud)

error[E0277]: a value of type `&HashSet<char>` cannot be built from an iterator over elements of type `&char`
  --> src/main.rs:41:73
   |
41 |     let intersection = iter.fold(base, |acc, set| acc.intersection(set).collect());
   |                                                                         ^^^^^^^ value of type `&HashSet<char>` cannot be built from `std::iter::Iterator<Item=&char>`
   |
   = help: the trait `FromIterator<&char>` is not implemented for `&HashSet<char>`

Run Code Online (Sandbox Code Playgroud)

即使理解了这一点，我仍然不想克隆数据。理论上这是没有必要的，我有原始向量中的数据，我应该能够使用参考。这会大大加快我的算法速度。这是纯粹的学术追求，所以我有兴趣让它尽可能快。

为此，我需要在 s 中累积HashSet<&char>，但我不能这样做，因为我无法在闭包中将aHashSet<&char>与 a相交。HashSet<char>所以看来我被困住了。有什么办法可以做到这一点吗？

或者，我可以为向量中的每个集合创建一组引用，但这看起来并没有好多少。它还能起作用吗？我可能会遇到同样的问题，但有双重引用。

最后，我实际上不需要保留原始数据，所以我可以将元素移动到累加器集中。我不知道如何实现这一点，因为我必须经历intersection这给我提供的参考。

上述建议是否可行？还有其他我没有看到的零复制解决方案吗？

Answer 1

orl*_*rlp 4

最后，我实际上不需要保留原始数据。

这使得事情变得非常容易。

首先，可以选择按大小对集合进行排序。然后：

let (intersection, others) = sets.split_at_mut(1);
let intersection = &mut intersection[0];
for other in others {
    intersection.retain(|e| other.contains(e));
}

Run Code Online (Sandbox Code Playgroud)

归档时间：	5 年，2 月前
查看次数：	4971 次
最近记录：	5 年，2 月前