如何递归地抓取文件服务器中的所有文件

Question

如何递归地抓取文件服务器中的所有文件

use*_*188 1 linux web-crawler macos

文件服务器http://xxxx.com中有数千个文件

我尝试用该工具抓取它httrack

它不起作用，是否有任何替代工具可以根据网址递归下载整个文件？

谢谢

Answer 1

小智 5

使用 wget：

wget --mirror -p --html-extension --convert-links www.example.com

选项解释：

-p                  get all images, etc. needed to display HTML page.  
--mirror            turns on recursion and time-stamping, sets infinite 
                      recursion depth and keeps FTP directory listings
--html-extension    save HTML docs with .html extensions  
--convert-links     make links in downloaded HTML point to local files.

Run Code Online (Sandbox Code Playgroud)

归档时间：	9 年，11 月前
查看次数：	1508 次
最近记录：	9 年，11 月前