jgr*_*gre 3 clojure lazy-sequences
我实现了一个函数,它将给定输入集合的n-gram作为惰性seq返回.
(defn gen-ngrams
[n coll]
(if (>= (count coll) n)
(lazy-seq (cons (take n coll) (gen-ngrams n (rest coll))))))
Run Code Online (Sandbox Code Playgroud)
当我用更大的输入集合调用此函数时,我希望看到执行时间的线性增加.但是,我观察到的时间比这更糟:
user> (time (count (gen-ngrams 3 (take 1000 corpus))))
"Elapsed time: 59.426 msecs"
998
user> (time (count (gen-ngrams 3 (take 10000 corpus))))
"Elapsed time: 5863.971 msecs"
9998
user> (time (count (gen-ngrams 3 (take 20000 corpus))))
"Elapsed time: 23584.226 msecs"
19998
user> (time (count (gen-ngrams 3 (take 30000 corpus))))
"Elapsed time: 54905.999 msecs"
29998
user> (time (count (gen-ngrams 3 (take 40000 corpus))))
"Elapsed time: 100978.962 msecs"
39998
Run Code Online (Sandbox Code Playgroud)
corpus
是一个Cons
字符串标记.
导致此行为的原因是什么?如何提高性能?
我认为你的问题是"(count coll)",它会在每次调用ngrams时迭代coll.
解决方案是使用build in partition函数:
user=> (time (count (gen-ngrams 3 (take 20000 corpus))))
"Elapsed time: 6212.894932 msecs"
19998
user=> (time (count (partition 3 1 (take 20000 corpus))))
"Elapsed time: 12.57996 msecs"
19998
Run Code Online (Sandbox Code Playgroud)
如果对实现感到好奇,请查看分区源http://clojuredocs.org/clojure_core/clojure.core/partition
归档时间: |
|
查看次数: |
242 次 |
最近记录: |