Clojure集与地图查找性能差异

Question

Clojure集与地图查找性能差异

我有一个uid列表，想检查一个uid是否是该列表的成员

实现它的自然方法是创建clojure.setuid 的集合（）并在该列表中搜索该成员

我发现映射键查找要快得多-我使用以下代码段对这两种方法进行基准测试：

(def uids #{:a :b :c :d :e :f :g :h :i :j :k :l :m :n :o :p :a1 :b1 :c1 :d1 :e1 :f1 :h1 :i1 :j1 :k1 :l1 :m1 :n1 :o1 :p1})
(def uids-map (reduce (fn [acc v] (assoc acc v true)) {} uids))
(time (dotimes [i 1000000] (:o1 uids)))
;user=> "Elapsed time: 191.076266 msecs"
(time (dotimes [i 1000000] (:o1 uids-map)))
;user=> "Elapsed time: 38.159388 msecs"

Run Code Online (Sandbox Code Playgroud)

结果在调用之间非常一致-映射查找约占设置查找的1/5

因此，对于键查找而言，设置不是最优的还是我使用的方式错误？

另外，这些基准差异的原因是什么？

我给人的印象是，在clojure中将集合实现为类似于向量的关联数据结构-那么为什么键查找比简单映射慢得多？

Answer 1

Reu*_*ani 7

我从未接触过clojure的源代码，但是从我看到的结果来看，set实现实际上在内部使用了map：

protected APersistentSet(IPersistentMap impl){
    this.impl = impl;
}

Run Code Online (Sandbox Code Playgroud)

它还将invoke调用委派给内部地图。

在APersistentSet中：

public Object invoke(Object arg1) {
    return get(arg1);
}

// ....

public Object get(Object key){
    return impl.valAt(key);
}

Run Code Online (Sandbox Code Playgroud)

在APersistentMap中：

public Object invoke(Object arg1) {
    return valAt(arg1);
}

public Object invoke(Object arg1, Object notFound) {
    return valAt(arg1, notFound);
}

Run Code Online (Sandbox Code Playgroud)

因此，这无法解释差异。

如@cgrand 的评论中所述，当我们更快地反转参数时（并且差不多，因为我们invoke立即调用set ）。因此，我查找了关键字，invoke它可能用于(:k obj)：

final public Object invoke(Object obj, Object notFound) {
    if(obj instanceof ILookup)
        return ((ILookup)obj).valAt(this,notFound);
    return RT.get(obj, this, notFound);
}

Run Code Online (Sandbox Code Playgroud)

需要注意的重要一点是，该ILookup实现是APersistentMap通过（通过Associative）实现的，而不是在中实现的APersistentSet。您还可以在clojure中进行验证：

(instance? clojure.lang.ILookup #{}) ;; false
(instance? clojure.lang.ILookup {})  ;; true

Run Code Online (Sandbox Code Playgroud)

因此，映射经历了“快乐的道路”并最终建立了RT.get我认为是运行时的环境。

让我们看一下运行时。

最初，它尝试执行与关键字几乎相同的操作：

static public Object get(Object coll, Object key){
    if(coll instanceof ILookup)
        return ((ILookup) coll).valAt(key);
    return getFrom(coll, key);
}

Run Code Online (Sandbox Code Playgroud)

但是因为我们知道集合没有实现，所以ILookup我们知道它们去了RT.getFrom：

static Object getFrom(Object coll, Object key){
    if(coll == null)
        return null;
    else if(coll instanceof Map) {
        Map m = (Map) coll;
        return m.get(key);
    }
    else if(coll instanceof IPersistentSet) {
        IPersistentSet set = (IPersistentSet) coll;
        return set.get(key);
    }
    else if(key instanceof Number && (coll instanceof String || coll.getClass().isArray())) {
        int n = ((Number) key).intValue();
        if(n >= 0 && n < count(coll))
            return nth(coll, n);
        return null;
    }
    else if(coll instanceof ITransientSet) {
        ITransientSet set = (ITransientSet) coll;
        return set.get(key);
    }

    return null;
}

Run Code Online (Sandbox Code Playgroud)

这使我相信主要的区别是instanceof由于集未实现而导致的额外委派和调用ILookup。

作为奖励，我在集合上添加了一个测试，该测试立即实现ILookup并委托valAt给内部映射（使用proxy），从而稍微缩小了差距：

(def uids #{:a :b :c :d :e :f :g :h :i :j :k :l :m :n :o :p :a1 :b1 :c1 :d1 :e1 :f1 :h1 :i1 :j1 :k1 :l1 :m1 :n1 :o1 :p1})
(def uids-map (into {} (for [k uids] [k k])))
(def lookupable-set (proxy [clojure.lang.APersistentSet clojure.lang.ILookup] [uids-map]
                      (valAt [k] (get uids-map k))))

;; verify
(instance? clojure.lang.APersistentSet lookupable-set) ;; true
(instance? clojure.lang.ILookup lookupable-set) ;; true

(time (dotimes [i 1000000] (:o1 uids))) ;; 134.703101 msecs
(time (dotimes [i 1000000] (:o1 lookupable-set))) ;; 63.187353 msecs  <-- faster
(time (dotimes [i 1000000] (:o1 uids-map))) ;; 35.802762 msecs <-- still fastest

Run Code Online (Sandbox Code Playgroud)

结论：性能很重要-在(#{...} k)不使用关键字的情况下调用集合的(k #{...})速度与map一样快。

但是我可能是错的:)

归档时间：	6 年，7 月前
查看次数：	151 次
最近记录：	6 年，7 月前