3 elisp
如何编写Emacs Lisp函数来查找HTML文件中的所有href并提取所有链接?
输入:
<html> <a href="http://www.stackoverflow.com" _target="_blank">StackOverFlow</a> <h1>Emacs Lisp</h1> <a href="http://news.ycombinator.com" _target="_blank">Hacker News</a> </html>
输出:
http://www.stackoverflow.com|StackOverFlow http://news.ycombinator.com|Hacker News
我在搜索过程中看到了多次提到的重新搜索转发功能.根据我到目前为止所读到的内容,我认为这是我需要做的.
(defun extra-urls (file)
...
(setq buffer (...
(while
(re-search-forward "http://" nil t)
(when (match-string 0)
...
))
小智 5
我采用了Heinzi的解决方案,并提出了我需要的最终解决方案.我现在可以获取文件列表,提取所有URL和标题,并将结果放在一个输出缓冲区中.
(defun extract-urls (fname)
"Extract HTML href url's,titles to buffer 'new-urls.csv' in | separated format."
(setq in-buf (set-buffer (find-file fname))); Save for clean up
(beginning-of-buffer); Need to do this in case the buffer is already open
(setq u1 '())
(while
(re-search-forward "^.*<a href=\"\\([^\"]+\\)\"[^>]+>\\([^<]+\\)</a>" nil t)
(when (match-string 0) ; Got a match
(setq url (match-string 1) ) ; URL
(setq title (match-string 2) ) ; Title
(setq u1 (cons (concat url "|" title "\n") u1)) ; Build the list of URLs
)
)
(kill-buffer in-buf) ; Don't leave a mess of buffers
(progn
(with-current-buffer (get-buffer-create "new-urls.csv"); Send results to new buffer
(mapcar 'insert u1))
(switch-to-buffer "new-urls.csv"); Finally, show the new buffer
)
)
;; Create a list of files to process
;;
(mapcar 'extract-urls '(
"/tmp/foo.html"
"/tmp/bar.html"
))
| 归档时间: |
|
| 查看次数: |
1042 次 |
| 最近记录: |