循环遍历名称中带有空格的文件？

Question

循环遍历名称中带有空格的文件？

Ami*_*ani 187 scripting bash find text-processing filenames

我编写了以下脚本来比较两个目录的输出，其中包含所有相同的文件：

#!/bin/bash

for file in `find . -name "*.csv"`  
do
     echo "file = $file";
     diff $file /some/other/path/$file;
     read char;
done

Run Code Online (Sandbox Code Playgroud)

我知道还有其他方法可以实现这一目标。奇怪的是，当文件中有空格时，这个脚本会失败。我该如何处理？

find 的示例输出：

./zQuery - abc - Do Not Prompt for Date.csv

Run Code Online (Sandbox Code Playgroud)

Answer 1

Mik*_*kel 262

简答（最接近你的答案，但处理空格）

OIFS="$IFS"
IFS=$'\n'
for file in `find . -type f -name "*.csv"`  
do
     echo "file = $file"
     diff "$file" "/some/other/path/$file"
     read line
done
IFS="$OIFS"

Run Code Online (Sandbox Code Playgroud)

更好的答案（还处理文件名中的通配符和换行符）

find . -type f -name "*.csv" -print0 | while IFS= read -r -d '' file; do
    echo "file = $file"
    diff "$file" "/some/other/path/$file"
    read line </dev/tty
done

Run Code Online (Sandbox Code Playgroud)

最佳答案（基于Gilles 的答案）

find . -type f -name '*.csv' -exec sh -c '
  file="$0"
  echo "$file"
  diff "$file" "/some/other/path/$file"
  read line </dev/tty
' exec-sh {} ';'

Run Code Online (Sandbox Code Playgroud)

或者甚至更好，以避免sh每个文件运行一个：

find . -type f -name '*.csv' -exec sh -c '
  for file do
    echo "$file"
    diff "$file" "/some/other/path/$file"
    read line </dev/tty
  done
' exec-sh {} +

Run Code Online (Sandbox Code Playgroud)

长答案

你有三个问题：

默认情况下，shell 在空格、制表符和换行符上拆分命令的输出
文件名可以包含会被扩展的通配符
如果存在名称以结尾的目录*.csv怎么办？

1. 仅在换行符上拆分

要确定要设置file的内容，shell 必须获取的输出find并以某种方式解释它，否则file将只是find.

shell 读取默认IFS设置为的变量<space><tab><newline>。

然后它查看输出中的每个字符find。只要它看到中的任何字符IFS，它就会认为这标志着文件名的结尾，因此它设置file为它到目前为止看到的任何字符并运行循环。然后它从它停止的地方开始获取下一个文件名，并运行下一个循环，等等，直到它到达输出的结尾。

所以它有效地做到了这一点：

for file in "zquery" "-" "abc" ...

Run Code Online (Sandbox Code Playgroud)

要告诉它只在换行符上拆分输入，您需要执行

IFS=$'\n'

Run Code Online (Sandbox Code Playgroud)

在你的for ... find命令之前。

这设置IFS为单个换行符，因此它仅在换行符上拆分，而不是空格和制表符。

如果您使用shordash代替ksh93, bashor zsh，则需要这样写IFS=$'\n'：

IFS='
'

Run Code Online (Sandbox Code Playgroud)

这可能足以让您的脚本正常工作，但是如果您有兴趣正确处理其他一些极端情况，请继续阅读...

2.$file无通配符扩展

在你做的循环内

diff $file /some/other/path/$file

Run Code Online (Sandbox Code Playgroud)

外壳尝试扩展$file（再次！）。

它可以包含空格，但由于我们已经在IFS上面设置，这里不会有问题。

但它也可能包含通配符，例如*or ?，这会导致不可预测的行为。（感谢吉尔斯指出这一点。）

要告诉 shell 不要扩展通配符，请将变量放在双引号内，例如

diff "$file" "/some/other/path/$file"

Run Code Online (Sandbox Code Playgroud)

同样的问题也可能让我们陷入困境

for file in `find . -name "*.csv"`

Run Code Online (Sandbox Code Playgroud)

例如，如果你有这三个文件

file1.csv
file2.csv
*.csv

Run Code Online (Sandbox Code Playgroud)

（非常不可能，但仍有可能）

就好像你跑了一样

for file in file1.csv file2.csv *.csv

Run Code Online (Sandbox Code Playgroud)

这将扩展到

for file in file1.csv file2.csv *.csv file1.csv file2.csv

Run Code Online (Sandbox Code Playgroud)

导致file1.csv并被file2.csv处理两次。

相反，我们必须做

find . -name "*.csv" -print | while IFS= read -r file; do
    echo "file = $file"
    diff "$file" "/some/other/path/$file"
    read line </dev/tty
done

Run Code Online (Sandbox Code Playgroud)

read从标准输入中读取行，根据将行拆分为单词IFS并将它们存储在您指定的变量名中。

在这里，我们告诉它不要将行拆分为单词，而是将行存储在$file.

另请注意，read line已更改为read line </dev/tty.

这是因为在循环内部，标准输入来自find管道。

如果我们只是这样做read，它将消耗部分或全部文件名，并且会跳过一些文件。

/dev/tty是用户运行脚本的终端。请注意，如果脚本通过 cron 运行，这将导致错误，但我认为在这种情况下这并不重要。

那么，如果文件名包含换行符怎么办？

我们可以处理，通过改变-print来-print0使用read -d ''在管道的末尾：

find . -name "*.csv" -print0 | while IFS= read -r -d '' file; do
    echo "file = $file"
    diff "$file" "/some/other/path/$file"
    read char </dev/tty
done

Run Code Online (Sandbox Code Playgroud)

这使得find在每个文件名的末尾放置一个空字节。空字节是文件名中唯一不允许的字符，所以这应该处理所有可能的文件名，无论多么奇怪。

要获取另一侧的文件名，我们使用IFS= read -r -d ''.

在我们read上面使用的地方，我们使用了默认的换行符，但现在，find使用 null 作为行分隔符。在中bash，您不能将参数中的 NUL 字符传递给命令（甚至是内置命令），但可以bash理解-d ''为NUL 分隔符。因此我们使用-d ''，使read使用相同的行分隔符为find。请注意-d $'\0'，顺便提一下，也可以工作，因为bash不支持 NUL 字节会将其视为空字符串。

为了正确起见，我们还添加了-r，它表示不要专门处理文件名中的反斜杠。例如，没有-r,\<newline>被删除，并\n转换为n。

一种更便携的编写方式，不需要bash或zsh记住上述所有关于空字节的规则（再次感谢 Gilles）：

find . -name '*.csv' -exec sh -c '
  file="$0"
  echo "$file"
  diff "$file" "/some/other/path/$file"
  read char </dev/tty
' exec-sh {} ';'

Run Code Online (Sandbox Code Playgroud)

* 3. 跳过名称以.csv结尾的目录

find . -name "*.csv"
Run Code Online (Sandbox Code Playgroud)
还将匹配名为something.csv.

为避免这种情况，请添加-type f到find命令中。

find . -type f -name '*.csv' -exec sh -c ' file="$0" echo "$file" diff "$file" "/some/other/path/$file" read line </dev/tty ' exec-sh {} ';'
Run Code Online (Sandbox Code Playgroud)
正如glenn jackman指出的那样，在这两个示例中，为每个文件执行的命令都在子 shell 中运行，因此如果您更改循环内的任何变量，它们将被遗忘。

如果您需要设置变量并在循环结束时仍然设置它们，您可以重写它以使用进程替换，如下所示：

i=0 while IFS= read -r -d '' file; do echo "file = $file" diff "$file" "/some/other/path/$file" read line </dev/tty i=$((i+1)) done < <(find . -type f -name '*.csv' -print0) echo "$i files processed"
Run Code Online (Sandbox Code Playgroud)
请注意，如果您尝试在命令行中复制和粘贴它，read line将消耗echo "$i files processed"，因此该命令将不会运行。

为避免这种情况，您可以删除read line </dev/tty结果并将结果发送到诸如less.

笔记

我删除;了循环内的分号 ( )。如果需要，您可以将它们放回原处，但它们不是必需的。

这些天，$(command)比更常见`command`。这主要是因为它$(command1 $(command2))比`command1 \`command2\``.

read char并没有真正阅读一个字符。它读取了一整行，所以我将其更改为read line.

将 `while` 放在管道中可能会导致创建的子 shell 出现问题（例如，在命令完成后循环块中的变量不可见）。使用 bash，我将使用输入重定向和进程替换：`while read -r -d $'\0' file; 做 ...; 完成 < <（查找 ... -print0）` (3认同)

Answer 2

Gil*_*il' 24

如果任何文件名包含空格或 shell 通配符，此脚本将失败\[?*。该find命令每行输出一个文件名。然后`find …`shell对命令替换进行评估，如下所示：

执行find命令，获取其输出。
将find输出拆分为单独的单词。任何空白字符都是单词分隔符。
对于每个单词，如果它是通配模式，则将其扩展为匹配的文件列表。

例如，假设当前目录中有三个文件，分别名为`foo* bar.csv、foo 1.txt和foo 2.txt。

该find命令返回./foo* bar.csv。
shell 在空格处分割这个字符串，产生两个单词：./foo*and bar.csv。
由于./foo*包含一个通配元字符，它被扩展为匹配文件的列表：./foo 1.txt和./foo 2.txt.
因此，for循环使用./foo 1.txt,./foo 2.txt和连续执行bar.csv。

在这个阶段，您可以通过减少分词和关闭通配来避免大多数问题。要减少分词，请将IFS变量设置为单个换行符；这样，的输出find将仅在换行符处拆分，空格将保留。要关闭通配，请运行set -f。那么只要没有文件名包含换行符，这部分代码就可以工作。

IFS='
'
set -f
for file in $(find . -name "*.csv"); do …

Run Code Online (Sandbox Code Playgroud)

（这不是您的问题的一部分，但我建议使用$(…)over `…`。它们具有相同的含义，但反引号版本具有奇怪的引用规则。）

下面还有一个问题：diff $file /some/other/path/$file应该是

diff "$file" "/some/other/path/$file"

Run Code Online (Sandbox Code Playgroud)

否则，将值$file拆分为单词，并将单词视为全局模式，就像上面的命令替换一样。如果您必须记住有关 shell 编程的一件事，请记住这一点：始终在变量扩展 ( $foo) 和命令替换 ( $(bar))周围使用双引号，除非您知道要拆分。（上面，我们知道我们想将find输出分成几行。）

一种可靠的调用方式find是告诉它为找到的每个文件运行一个命令：

find . -name '*.csv' -exec sh -c '
  echo "$0"
  diff "$0" "/some/other/path/$0"
' {} ';'

Run Code Online (Sandbox Code Playgroud)

在这种情况下，另一种方法是比较两个目录，尽管您必须明确排除所有“无聊”的文件。

diff -r -x '*.txt' -x '*.ods' -x '*.pdf' … . /some/other/path

Run Code Online (Sandbox Code Playgroud)

@userunknown：使用 `{}` 作为 `find -exec` 中参数的子字符串是不可移植的，这就是需要 shell 的原因。我不明白你说的“shell 需要屏蔽参数”是什么意思；如果是关于引用，我的解决方案被正确引用。你是对的，`echo` 部分可以由 `-print` 代替。`-okdir` 是一个相当新的 GNU 查找扩展，它并非随处可用。我没有包括等待继续，因为我认为 UI 非常糟糕，如果提问者愿意，他可以很容易地将 `read` 放在 shell 片段中。 (2认同)
“掩蔽”在 shell 文献中不是常用术语，因此如果您想被理解，就必须解释您的意思。我的例子只使用了一次 `{}` 并且在一个单独的参数中；其他情况（使用两次或作为子字符串）不可移植。“便携”意味着它可以在所有的 unix 系统上工作；[POSIX/Single Unix 规范](http://pubs.opengroup.org/onlinepubs/009695399/utilities/find.html) 是一个很好的指南。 (2认同)

Answer 3

小智 15

我很惊讶没有看到readarray提到。当与<<<操作符结合使用时，它使这变得非常容易：

$ touch oneword "two words"

$ readarray -t files <<<"$(ls)"

$ for file in "${files[@]}"; do echo "|$file|"; done
|oneword|
|two words|

Run Code Online (Sandbox Code Playgroud)

使用该<<<"$expansion"构造还允许您将包含换行符的变量拆分为数组，例如：

$ string=$(dmesg)
$ readarray -t lines <<<"$string"
$ echo "${lines[0]}"
[    0.000000] Initializing cgroup subsys cpuset

Run Code Online (Sandbox Code Playgroud)

readarray 多年来一直在 Bash 中，所以这可能应该是在 Bash 中执行此操作的规范方法。

Answer 4

Sté*_*las 7

我很惊讶没有人在zsh这里提到明显的解决方案：

for file (**/*.csv(ND.)) {
  do-something-with $file
}

Run Code Online (Sandbox Code Playgroud)

（(D)还包括隐藏文件，(N)以避免在不匹配时出现错误，(.)限制为常规文件。）

bash4.3 及以上现在也部分支持它：

shopt -s globstar nullglob dotglob
for file in **/*.csv; do
  [ -f "$file" ] || continue
  [ -L "$file" ] && continue
  do-something-with "$file"
done

Run Code Online (Sandbox Code Playgroud)

Answer 5

use*_*own 6

Afaik find 拥有您所需要的一切。

find . -okdir diff {} /some/other/path/{} ";"

Run Code Online (Sandbox Code Playgroud)

find 自己负责调用程序。-okdir 将在差异之前提示您（您确定是/否）。

不涉及外壳，没有通配符，小丑，pi，pa，po。

作为旁注：如果您将 find 与 for/while/do/xargs 结合使用，在大多数情况下，您做错了。:)

Find 已经迭代了一个文件子集。大多数出现问题的人只需将其中一种操作（-ok(dir) -exec(dir), -delete）与“;”结合使用即可或 +（稍后用于并行调用）。这样做的主要原因是，您不必摆弄文件参数，为外壳屏蔽它们。没那么重要：您不需要一直使用新进程、更少的内存、更高的速度。较短的程序。 (2认同)

Answer 6

l0b*_*0b0 6

使用完全安全的查找循环遍历任何文件（包括任何特殊字符）（请参阅文档链接）：

exec 9< <( find "$absolute_dir_path" -type f -print0 )
while IFS= read -r -d '' -u 9
do
    file_path="$(readlink -fn -- "$REPLY"; echo x)"
    file_path="${file_path%x}"
    echo "START${file_path}END"
done

Run Code Online (Sandbox Code Playgroud)

归档时间：	14 年，7 月前
查看次数：	250750 次
最近记录：	4 年，10 月前