wget - 只在每个子目录中获取 .listing 文件

Question

wget - 只在每个子目录中获取 .listing 文件

如果我使用命令

wget --no-remove-listing -P ...../debugdir/gnu/<dir>/ ftp:<ftp-site>/gnu/<dir>/

Run Code Online (Sandbox Code Playgroud)

我会得到.listing那个目录的文件。但是我必须遍历每个后续子目录才能获得整个结构。有没有办法.listing用一个命令从所有（子）目录中获取文件？

另外，我注意到index.html每次访问后都会自动生成该文件。有没有办法抑制这种行为？

问题是我总是发现 Bash 处理速度很慢，但经过一些分析后，我发现最大的延迟是.listing从后续子目录中获取每个文件。

示例：检查 GNU 树中的特定文件扩展名大约需要 320 秒，其中 290 秒用于处理上述wget命令。

Answer 1

Cod*_*x24 5

如果您希望建立 FTP 站点的索引，即列出站点上的所有子目录和文件而不实际检索它们，您可以这样做：

wget -r -x --no-remove-listing --spider ftp://ftp.example.com/

Run Code Online (Sandbox Code Playgroud)

在哪里，

-r => 递归（即访问子目录）
-x => 强制在客户端创建镜像子目录
--no-remove-listing => 在每个子目录中保留“.listing”文件
--spider => 访问但不检索文件

这将在客户端和服务器上创建一个结构相同的稀疏目录树，只包含“.listing”文件，显示每个目录的内容（“ls -l”的结果）。如果你想把它消化成一个路径限定的文件名列表（就像你从“find . -type f”中得到的那样），那么在那个稀疏目录树的根执行这个：

find . -type f -exec dos2unix {} \;
( find . -maxdepth 999 -name .listing -exec \
awk '$1 !~ /^d/ {C="date +\"%Y-%m-%d %H:%M:%S\" -d \"" $6 " " $7 " " $8 "\""; \
C | getline D; printf "%s\t%12d\t%s%s\n", D, $5, gensub(/[^/]*$/,"","g",FILENAME), $9}' \
{} \; 2>/dev/null ) | sort -k4

Run Code Online (Sandbox Code Playgroud)

这会给你输出像

2000-09-27 00:00:00       261149    ./README
2000-08-31 00:00:00       727040    ./foo.txt
2000-10-02 00:00:00      1031115    ./subdir/bar.txt
2000-11-02 00:00:00      1440830    ./anotherdir/blat.txt

Run Code Online (Sandbox Code Playgroud)

注意：在这个用例中，“-maxdepth 999”选项不是必需的，我把它留在了我正在测试的调用中，它有一个额外的约束：限制报告的树的深度。例如，如果您扫描包含多个项目的完整源代码树的站点，例如

./foo/Makefile
./foo/src/...
./foo/test/...
./bar/Makefile
./bar/src/...
./bar/test/...

Run Code Online (Sandbox Code Playgroud)

那么您可能只需要项目和顶级目录的大纲。在这种情况下，您将提供一个类似“-maxdepth 2”的选项。

归档时间：	13 年，6 月前
查看次数：	12211 次
最近记录：	5 年，10 月前