在Clojure中使用java.io/reader时获得副作用的正确方法是什么?

eat*_*rus 2 clojure

我正在从一个非常大的文本文件中读取行.该文件包含一组我想从中选择特定行号的数据.我想要做的是从文件中读取一行,如果该行是我想要的行,请将其与我的结果联系起来,如果不是,则检查下一行.我不想存储我在内存中看到的所有行,所以我想在阅读它们时将它们从阅读器行中删除.

我有这样的功能:

;; evaluates but doesn't modify the line sequence so continuously adds 
;; the same first line to the result. I would like this exact function 
;; but somehow have it drop the first line of lines at each iteration.
    (defn get-training-data [batch-size batch-num]
      (let [line-numbers (fn that returns vector of random numbers)]
        (with-open [rdr (clojure.java.io/reader "resources/sample.txt")]
          (let [lines (line-seq rdr) res []]
            (for [i (range (apply max line-numbers))
                  :let [res (conj res (json/read-str (first lines)))]
                  :when (some #{i} line-numbers)]
              res)))))
Run Code Online (Sandbox Code Playgroud)

我也有这样的功能:

;;this works as I want it to, but only with a small file and produces a 
;;stack overflow with a large file
    (defn get-training-data1 [batch-size batch-num]
      (let [line-numbers (fn that returns a vector of random numbers)]
        (with-open [rdr (clojure.java.io/reader "resources/sample.txt")]
          (let [lines (line-seq rdr)]
            (loop [i 0 f (apply max line-numbers) res [] lines lines]
              (if (> i f)
                res
                (if (some #{i} line-numbers)
                  (recur
                   (inc i)
                   f
                   (conj res (json/read-str (first lines)))
                   (drop 1 lines))
                  (recur
                   (inc i)
                   f
                   res
                   (drop 1 lines)))))))))
Run Code Online (Sandbox Code Playgroud)

当我试图测试这个时,我开发了以下更简单的情况:

;;works
(let [res []]
  (for [i (range 10)
        :let [res (conj res i)]
        :when (odd? i)]
    res)) ;;([1] [3] [5] [7] [9])

;;now an attempt to get the same result but have a side effect each time, 
;;produces null pointer exception.
(let [res []]
  (for [i (range 10)
        :let [res (conj res i)]  
        :when (odd? i)]
    (doall 
     (println i)
     res)))
Run Code Online (Sandbox Code Playgroud)

我相信如果我能弄清楚如何在for中产生副作用,那么第一个问题就会被解决,因为我可以让副作用放弃读者行序列的第一行.

你们有什么想法吗?

Art*_*ldt 5

map和filter可以很好地完成这项工作并使其保持懒惰状态,因此您不再需要存储在内存中.

user> (->> (line-seq (clojure.java.io/reader "project.clj")) ;; lazy sequence of lines
           (map vector (range))                              ;; add an index
           (filter #(#{1 3 7 9} (first %)))                  ;; filter by index
           (map second ))                                    ;; drop the index

("  :description \"API server for Yummly mobile app(s)\"" 
 "[com.project/example \"1.4.8-SNAPSHOT\"]" 
 "                 [org.clojure/tools.cli \"0.2\.4\"]" 
 "                 [clojurewerkz/mailer \"1.0.0-alpha3\"]")
Run Code Online (Sandbox Code Playgroud)