如何刷新使用`wget --mirror`创建的在线网站镜像？

Question

如何刷新使用`wget --mirror`创建的在线网站镜像？

一个月前，我使用“ wget --mirror ”创建了我们公共网站的镜像，以便在即将到来的计划维护窗口期间临时使用。我们的主要网站运行 HTML、PHP 和 MySQL，但镜像只需要纯 HTML，不需要动态内容、PHP 或数据库。

以下命令将为我们的网站创建一个简单的在线镜像：

wget --mirror http://www.example.org/

Run Code Online (Sandbox Code Playgroud)

请注意，Wget 手册说--mirror“目前相当于-r -N -l inf --no-remove-listing”（人类可读的等价物是`--recursive --timestamping --level=inf --no-remove-listing。

现在一个月过去了，网站的大部分内容都发生了变化。我希望 wget 检查所有页面，并下载任何已更改的页面。但是，这不起作用。

我的问题：

除了删除目录并重新运行镜像之外，我需要做什么来刷新网站的镜像？

http://www.example.org/index.html上的顶级文件没有改变，但有许多其他文件已经改变。

我以为我需要做的就是重新运行wget --mirror，因为--mirror暗示标志--recursive“指定递归下载”和--timestamping“除非比本地新，否则不要重新检索文件”。我认为这会检查所有页面并且只检索比我的本地副本更新的文件。我错了吗？

但是， wget 不会在第二次尝试时递归该站点。'wget --mirror' 会检查http://www.example.org/index.html，注意这个页面没有变化，然后停止。

--2010-06-29 10:14:07--  http://www.example.org/
Resolving www.example.org (www.example.org)... 10.10.6.100
Connecting to www.example.org (www.example.org)|10.10.6.100|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Server file no newer than local file "www.example.org/index.html" -- not retrieving.

Loading robots.txt; please ignore errors.
--2010-06-29 10:14:08--  http://www.example.org/robots.txt
Connecting to www.example.org (www.example.org)|10.10.6.100|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 136 [text/plain]
Saving to: “www.example.org/robots.txt”

     0K                                                       100% 6.48M=0s
2010-06-29 10:14:08 (6.48 MB/s) - "www.example.org/robots.txt" saved [136/136]

--2010-06-29 10:14:08--  http://www.example.org/news/gallery/image-01.gif
Reusing existing connection to www.example.org:80.
HTTP request sent, awaiting response... 200 OK
Length: 40741 (40K) [image/gif]
Server file no newer than local file "www.example.org/news/gallery/image-01.gif" -- not retrieving.

FINISHED --2010-06-29 10:14:08--
Downloaded: 1 files, 136 in 0s (6.48 MB/s)

Run Code Online (Sandbox Code Playgroud)

Answer 1

Ste*_*ski 5

以下解决方法目前似乎有效。它强行删除 /index.html ，这会迫使 wget 再次检查所有子链接。但是，wget 不应该自动检查所有子链接吗？

rm www.example.org/index.html && wget --mirror http://www.example.org/

Run Code Online (Sandbox Code Playgroud)

归档时间：	15 年，4 月前
查看次数：	9610 次
最近记录：	7 年，4 月前