wget 不转换链接

acr*_*man 8 website wget mirroring

我正在尝试在大修之前镜像一个相当大的网站(20,000 多页)。基本上,在切换到新的之前我需要一个备份,以防我们忘记了我们需要的东西(我们将有大约 1,000 页在发布时)。该站点在 CMS 上运行,我无法轻松从中提取可用数据,因此我尝试使用 wget 制作副本。

我的问题是 wget 似乎并没有真正转换链接,尽管命令中存在 --convert-links 或 -k 。我尝试了几种不同的标志组合,但我一直无法获得所需的输出。最近失败的尝试是:

nohup wget --mirror -k -l10 -PafscSnapshot --html-extension -R *calendar* -o wget.log http://www.example.org &
Run Code Online (Sandbox Code Playgroud)

我还包括了 --backup-converted 和 --convert-links 而不是 -k(这并不重要)。我已经在有和没有 -P 和 -l 的情况下完成了它,同样不,它们应该很重要。

结果仍然有链接的文件,如:

http://www.example.org/ht/d/sp/i/17770
Run Code Online (Sandbox Code Playgroud)

小智 12

这是一篇旧帖子,但我将答案放在这里供未来的搜索者使用。

--convert-links功能仅站点下载完成发生。我猜想,对于如此大的站点,您试图在完成几页后停止该过程,因此该过程尚未开始。

另见/sf/ask/444380261/

来自 wget 文档

‘-k’
‘--convert-links’
After the download is complete, convert the links in the document to make them suitable for local viewing. This affects not only the visible hyperlinks, but any part of the document that links to external content, such as embedded images, links to style sheets, hyperlinks to non-html content, etc.

Each link will be changed in one of the two ways:

    The links to files that have been downloaded by Wget will be changed to refer to the file they point to as a relative link.

    Example: if the downloaded file /foo/doc.html links to /bar/img.gif, also downloaded, then the link in doc.html will be modified to point to ‘../bar/img.gif’. This kind of transformation works reliably for arbitrary combinations of directories.
    The links to files that have not been downloaded by Wget will be changed to include host name and absolute path of the location they point to.

    Example: if the downloaded file /foo/doc.html links to /bar/img.gif (or to ../bar/img.gif), then the link in doc.html will be modified to point to http://hostname/bar/img.gif. 

Because of this, local browsing works reliably: if a linked file was downloaded, the link will refer to its local name; if it was not downloaded, the link will refer to its full Internet address rather than presenting a broken link. The fact that the former links are converted to relative links ensures that you can move the downloaded hierarchy to another directory.

Note that only at the end of the download can Wget know which links have been downloaded. Because of that, the work done by ‘-k’ will be performed at the end of all the downloads. 
Run Code Online (Sandbox Code Playgroud)


mat*_*kie 1

也许您遇到过由于操作系统文件名限制,wget -k 在 Windows 和 Linux 上以不同方式转换文件的情况?