Clojure中的URL检查器?

2 clojure

我有一个在Perl中使用的URL检查器.我想知道如何在Clojure中完成这样的事情.我有一个包含数千个URL的文件,我希望输出文件包含URL(减去http://,https://)和简单:1表示有效,0表示false.理想情况下,考虑到这是Clojure的优势之一,我可以同时检查每个站点.

输入

http://www.google.com
http://www.cnn.com
http://www.msnbc.com
http://www.abadurlisnotgood.com

产量

www.google.com:1
www.cnn.com:1
www.msnbc.com:1
www.abadurlisnotgood.com:0

Bri*_*per 6

我假设"有效URL"表示HTTP响应200.这可能有效.它需要clojure-contrib.改变mappmap试图使其平行,就像Arthur Ulfeldt所说的那样.

(use '(clojure.contrib duck-streams
                       java-utils
                       str-utils))

(import '(java.net URL
                   URLConnection
                   HttpURLConnection
                   UnknownHostException))

(defn check-url [url]
  (str (re-sub #"^(?i)http:/+" "" url)
       ":"
       (try
        (let [c (cast HttpURLConnection
                      (.openConnection (URL. url)))]
          (if (= 200 (.getResponseCode c))
            1
            0))
        (catch UnknownHostException _
          0))))

(defn check-urls-from-file [filename]
  (doseq [line (map check-url
                    (read-lines (as-file filename)))]
    (println line)))
Run Code Online (Sandbox Code Playgroud)

以您的示例为输入:

user> (check-urls-from-file "urls.txt")
www.google.com:1
www.cnn.com:1
www.msnbc.com:1
www.abadurlisnotgood.com:0
Run Code Online (Sandbox Code Playgroud)