如何循环遍历 bash 中不断增加的文件列表?

Rus*_*lan 4 bash files

我有一个正在运行的文件生成器,其中每个文件在前一个文件之后都有一个按字母顺序排列的名称。起初我正在做我的循环for file in /path/to/files*; do...,但我很快意识到 glob 只会在循环之前扩展,并且不会处理循环时创建的任何新文件。

我目前这样做的方式很丑陋:

while :; do
    doneFileCount=$(wc -l < /tmp/results.csv)
    i=0
    for file in *; do
        if [[ $((doneFileCount>i)) = 1 ]]; then
            i=$((i+1))
            continue
        else
            process-file "$file" # prints single line to stdout
            i=$((i+1))
        fi
    done | tee -a /tmp/results.csv
done
Run Code Online (Sandbox Code Playgroud)

有没有简单的方法来循环不断增加的文件列表,而没有上述的黑客攻击?

ilk*_*chu 7

我认为通常的方法是让新文件出现在一个目录中,并在处理后将它们重命名/移动到另一个目录,这样它们就不会再次碰到同一个 glob。所以像这样

cd new/
while true; do 
    for f in * ; do
        process file "$f" move to "../processed/$f"
    done
    sleep 1   # just so that it doesn't busyloop
done
Run Code Online (Sandbox Code Playgroud)

或者类似地更改文件扩展名:

while true; do 
    for f in *.new ; do
        process file "$f" move to "${f%.new}.done"
    done
    sleep 1   # just so that it doesn't busyloop
done
Run Code Online (Sandbox Code Playgroud)

在 Linux 上,您还可以inotifywait用来获取有关新文件的通知。

inotifywait -q -m -e moved_to,close_write --format "%f" . | while read -r f ; do
    process file "$f"
done
Run Code Online (Sandbox Code Playgroud)

In either case, you'll want to watch for files that are still being written to. A large file created in-place will not appear atomically, but your script might start processing it when it's only halfway written.

The inotify close_write event above will see files when the writing process closes them (but it also catches modified files), while the create event would see the file when it's first created (but it might still be written to). moved_to simply catches files that are moved to the directory being watched.