如何在 Bash 中有效地循环遍历文件的行？

Question

如何在 Bash 中有效地循环遍历文件的行？

我有一个example.txt大约 3000 行的文件，每行都有一个字符串。一个小文件的例子是：

>cat example.txt
saudifh
sometestPOIFJEJ
sometextASLKJND
saudifh
sometextASLKJND
IHFEW
foo
bar

Run Code Online (Sandbox Code Playgroud)

我想检查该文件中的所有重复行并输出它们。期望的输出是：

>checkRepetitions.sh
found two equal lines: index1=1 , index2=4 , value=saudifh
found two equal lines: index1=3 , index2=5 , value=sometextASLKJND

Run Code Online (Sandbox Code Playgroud)

我做了一个脚本checkRepetions.sh：

#!bin/bash
size=$(cat example.txt | wc -l)
for i in $(seq 1 $size); do
i_next=$((i+1))
line1=$(cat example.txt | head -n$i | tail -n1)
for j in $(seq $i_next $size); do
line2=$(cat example.txt | head -n$j | tail -n1)
if [ "$line1" = "$line2" ]; then
echo "found two equal lines: index1=$i , index2=$j , value=$line1"
fi
done
done

Run Code Online (Sandbox Code Playgroud)

不过这个脚本非常慢，运行需要10多分钟。在python中，它需要不到5秒的时间...我试图通过不断地把文件存储在内存中，lines=$(cat example.txt)但这line1=$(cat $lines | cut -d',' -f$i)仍然很慢...

Answer 1

Wal*_*r A 5

当您不想使用awk（一个很好的工具，只需解析一次输入）时，您可以多次运行这些行。排序的成本很高，但此解决方案避免了您尝试过的循环。

grep -Fnxf <(uniq -d <(sort example.txt)) example.txt

Run Code Online (Sandbox Code Playgroud)

您可以uniq -d <(sort example.txt)找到所有出现多次的行。接下来grep将搜索这些（选项-f）完整的（-x）行，不带正则表达式（-F）并显示它出现的行（-n）。

归档时间：	7 年，9 月前
查看次数：	5528 次
最近记录：	7 年，9 月前