我有一组前缀/值对,并希望在此连接中找到与当前目标字符串开头的前缀相关联的任何值.(在多个前缀匹配的情况下定义行为并不重要,因为我的用例的性质应该永远不会发生).
一个天真的(工作)实现如下:
(defn prefix-match [target-str pairs]
(some
(fn [[k v]]
(if (.startsWith target-str k)
v
false))
pairs))
Run Code Online (Sandbox Code Playgroud)
这样:
user=> (prefix-match "foobar" {"meh" :qux, "foo" :baz})
:baz
Run Code Online (Sandbox Code Playgroud)
这按预期工作,但是O(n)具有pairs序列的长度.(快速插入pairs也是可取的,但不如快速查找那么重要).
首先想到的是使用有效的随机访问来对已排序的集合进行二等分,但我不确定Clojure中哪些数据结构最适合该任务.建议?
Jus*_*mer 19
特里怎么样?
(defn build-trie [seed & kvs]
(reduce
(fn [trie [k v]]
(assoc-in trie (concat k [:val]) v))
seed
(partition 2 kvs)))
(defn prefix-match [target trie]
(when (seq target)
(when-let [node (trie (first target))]
(or (:val node)
(recur (rest target) node)))))
Run Code Online (Sandbox Code Playgroud)
用法:
user> (def trie (build-trie {} "foo" :baz "meh" :qux))
#'user/trie
user> trie
{\m {\e {\h {:val :qux}}}, \f {\o {\o {:val :baz}}}}
user> (prefix-match "foobar" trie)
:baz
user> (prefix-match "foo" trie)
:baz
user> (prefix-match "f" trie)
nil
user> (prefix-match "abcd" trie)
nil
Run Code Online (Sandbox Code Playgroud)
一种高效、简洁的方法是利用rsubseq,它适用于任何类型的实现clojure.lang.Sorted——包括sorted-map.
(defn prefix-match [sorted-map target]
(let [[closest-match value] (first (rsubseq sorted-map <= target))]
(if closest-match
(if (.startsWith target closest-match)
value
nil)
nil)))
Run Code Online (Sandbox Code Playgroud)
这通过了我套件中的相关测试:
(deftest prefix-match-success
(testing "prefix-match returns a successful match"
(is (prefix-match (sorted-map "foo" :one "bar" :two) "foobar") :one)
(is (prefix-match (sorted-map "foo" :one "bar" :two) "foo") :one)))
(deftest prefix-match-fail
(testing "prefix-match returns nil on no match"
(is (= nil (prefix-match (sorted-map "foo" :one, "bar" :two) "bazqux")))
(is (= nil (prefix-match (sorted-map "foo" :one, "bar" :two) "zzz")))
(is (= nil (prefix-match (sorted-map "foo" :one, "bar" :two) "aaa")))))
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
4241 次 |
| 最近记录: |