到底为什么使用集合进行过滤比使用向量进行过滤性能更高？

Question

到底为什么使用集合进行过滤比使用向量进行过滤性能更高？

mba*_*ake 2 lisp vector clojure set filter

经过一些研究，我最近通过使用集合而不是向量进行比较，能够显着提高某些代码的性能。这是初始代码的简单示例：

(def target-ids ["a" "b" "c"])

(def maps-to-search-through 
  [{"id": "a" "value": "example"} 
   {"id": "e" "value": "example-2"}])

(filter (fn [i] (some #(= (:id i) %) target-ids)) maps-to-search-through)

Run Code Online (Sandbox Code Playgroud)

这是优化后的代码：

(def target-ids #{"a" "b" "c"})

(def maps-to-search-through
  [{"id": "a" "value": "example"} 
   {"id": "e" "value": "example-2"}])

(filter (comp target-ids :id) maps-to-search-through)

Run Code Online (Sandbox Code Playgroud)

作为参考，target-ids和maps-to-search-through都是动态生成的，并且每个都可以包含数千个值 - 尽管maps-to-search-through总是比至少大 5 倍target-ids。

我在网上找到的所有建议和文档都表明这种改进，特别是使用集合而不是向量，会明显更快，但没有详细说明原因。我知道在最初的情况下，filter需要做很多工作 - 在每一步上迭代两个向量。但我不明白改进后的代码中为什么不是这种情况。

谁能帮忙解释一下吗？

Answer 1

Jos*_*hua 7

集合是被设计为仅包含唯一值的数据结构。您还可以将它们用作函数来检查给定值是否是该集合的成员 - 就像您使用target-ids集合一样。Set.contains它基本上可以归结为JVM 端的调用，它使用一些巧妙的基于哈希的逻辑。

您的第一个解决方案使用循环遍历向量some，因此它类似于嵌套for循环，但显然速度较慢。

归档时间：	2 年，11 月前
查看次数：	84 次
最近记录：	2 年，11 月前