在Clojure向量中查找类似正则表达式的值序列

fra*_*ank 3 clojure

我正在使用libpostal库在新闻文章中查找完整地址(街道,城市,州和邮政编码).给定输入文本时的libpostal:

位于CO 10566的Main Street Boulder发生了一起事故 - 位于Wilson的拐角处.

返回一个向量:

[{:label "house", :value "there was an accident at 5"}
 {:label "road", :value "main street"} 
 {:label "city", :value "boulder"}
 {:label "state", :value "co"}
 {:label "postcode", :value "10566"}
 {:label "road", :value "which is at the corner of wilson."}
Run Code Online (Sandbox Code Playgroud)

我想知道在Clojure中是否有一种聪明的方法来提取序列中出现:label值的序列:

[road unit? level? po_box? city state postcode? country?]
Run Code Online (Sandbox Code Playgroud)

where ?表示匹配中的可选值.

Tay*_*ood 6

你可以用clojure.spec做到这一点.首先定义一些与地图的:label值匹配的规范:

(defn has-label? [m label] (= label (:label m)))
(s/def ::city #(has-label? % "city"))
(s/def ::postcode #(has-label? % "postcode"))
(s/def ::state #(has-label? % "state"))
(s/def ::house #(has-label? % "house"))
(s/def ::road #(has-label? % "road"))
Run Code Online (Sandbox Code Playgroud)

然后定义一个正则表达式规范,例如s/cat+ s/?:

(s/def ::valid-seq
  (s/cat :road ::road
         :city (s/? ::city) ;; ? = zero or once
         :state ::state
         :zip (s/? ::postcode)))
Run Code Online (Sandbox Code Playgroud)

现在你可以conform或者valid?你的序列:

(s/conform ::valid-seq [{:label "road" :value "Damen"}
                        {:label "city" :value "Chicago"}
                        {:label "state" :value "IL"}])
=>
{:road {:label "road", :value "Damen"},
 :city {:label "city", :value "Chicago"},
 :state {:label "state", :value "IL"}}
;; this is also valid, missing an optional value in the middle
(s/conform ::valid-seq [{:label "road" :value "Damen"}
                        {:label "state" :value "IL"}
                        {:label "postcode" :value "60622"}])
=>
{:road {:label "road", :value "Damen"},
 :state {:label "state", :value "IL"},
 :zip {:label "postcode", :value "60622"}}
Run Code Online (Sandbox Code Playgroud)