如何解析和比较文件?

Dan*_*Dan 5 clojure

我很感激有关如何利用Clojure有效地解析和比较两个文件的建议/见解.有两个(日志)文件包含员工出勤率; 从这些文件,我需要确定的日子两个雇员工作同样的时间,在同一个部门.以下是日志文件的示例.

注意:每个文件都有不同数量的条目.

第一档:

Employee Id     Name         Time In          Time Out          Dept.
mce0518         Jon     2011-01-01 06:00  2011-01-01 14:00       ER
mce0518         Jon     2011-01-02 06:00  2011-01-01 14:00       ER
mce0518         Jon     2011-01-04 06:00  2011-01-01 13:00       ICU
mce0518         Jon     2011-01-05 06:00  2011-01-01 13:00       ICU
mce0518         Jon     2011-01-05 17:00  2011-01-01 23:00       ER
Run Code Online (Sandbox Code Playgroud)

第二档:

Employee Id     Name            Time In           Time Out          Dept.
pdm1705         Jane        2011-01-01 06:00  2011-01-01 14:00       ER
pdm1705         Jane        2011-01-02 06:00  2011-01-01 14:00       ER
pdm1705         Jane        2011-01-05 06:00  2011-01-01 13:00       ER
pdm1705         Jane        2011-01-05 17:00  2011-01-01 23:00       ER
Run Code Online (Sandbox Code Playgroud)

Ham*_*aya 3

如果你不打算定期这样做,


(defn data-seq [f]
  (with-open [rdr (java.io.BufferedReader. 
                   (java.io.FileReader. f))]
    (let [s (rest (line-seq rdr))]
      (doall (map seq (map #(.split % "\\s+") s))))))

(defn same-time? [a b]
  (let [a  (drop 2 a)
        b  (drop 2 b)]
    (= a b)))

(let [f1 (data-seq "f1.txt")
      f2 (data-seq "f2.txt")]

  (reduce (fn[h v]
            (let [f2 (filter #(same-time? v %) f2)]
              (if (empty? f2)
                h
                (conj h [(first v) (map first f2)]))))  [] f1) 
  )

会得到你,

 [["mce0518" ("pdm1705")] ["mce0518" ("pdm1705")] ["mce0518" ("pdm1705")]]
Run Code Online (Sandbox Code Playgroud)