在给定 grep 结果的情况下,在第三列中查找具有最低值的行

Mon*_*lal 5 command-line bash scripts grep text-processing

我有一个由这样的行组成的文件(包括其他数字)。这是输出的一部分

$ grep 1848 filename.csv
Run Code Online (Sandbox Code Playgroud)

.csv假设 1848 位于第一列或第二列,如何找到文件中第三列最低的前 5 行?

1848,2598,11.310694021273559
1848,2599,10.947275955606203
1848,2600,10.635270124233982
1848,2601,11.916564552040725
1848,2602,12.119810736845844
1848,2603,12.406661156256154
1848,2604,10.636275056472996
1848,2605,12.549890992708612
1848,2606,9.783802450936204
1848,2607,11.253697489670264
1848,2608,12.16385432290674
1848,2609,10.30355814063016
1848,2610,12.102525596913923
1848,2611,11.636595992818505
1848,2612,10.741178028606866
1848,2613,11.352414275107423
1848,2614,12.204860161717253
1848,2615,12.959915468475387
1848,2616,11.320652192610872
Run Code Online (Sandbox Code Playgroud)

不幸的是,1848 有时也会出现在第三列中,我需要忽略它:

6687,8963,9.241848677632822
6687,9111,10.537325656184889
6687,9506,11.315629894841848
Run Code Online (Sandbox Code Playgroud)

Cyr*_*rus 7

使用 GNU 排序:

grep -E '(^1848|^[0-9]{4},1848)' file | sort -t, -k3n | head -n 5
Run Code Online (Sandbox Code Playgroud)

(如果第一列可能少于或多于 4 位数字,请替换{4}+

输出:

1848,2606,9.783802450936204
1848,2609,10.30355814063016
1848,2600,10.635270124233982
1848,2604,10.636275056472996
1848,2612,10.741178028606866
Run Code Online (Sandbox Code Playgroud)

  • OP 提到 1848 可能出现在第二列,所以也许我们需要类似`grep -E '(^1848|^[0-9]{4},1848)'` (2认同)

hee*_*ayl 6

只需awk

awk -F, 'BEGIN{PROCINFO["sorted_in"]="@ind_num_asc"} \
          $1==1848||$2==1848 {a[$3]=$0} END {for(i in a) print a[i]}' file.csv
Run Code Online (Sandbox Code Playgroud)
  • BEGIN{PROCINFO["sorted_in"]="@ind_num_asc"} 设置将根据索引,根据数字,以升序方式创建的任何数组的顺序

  • $1==1848||$2==1848 {a[$3]=$0}检查第一个或第二个字段是否为 1848,如果是,则将第三个字段 ( $3) 作为数组a索引,其值为整个记录 ( $0)

  • 在 中END {for(i in a) print a[i]},我们简单地迭代键并打印值

要仅获取 5 条记录,请head -5在末尾添加:

awk ... | head -5
Run Code Online (Sandbox Code Playgroud)

为了完整起见,您显然可以通过breakEND循环中加入一个微小的逻辑来只获得前 5 条记录,不需要tail

awk -F, 'BEGIN{PROCINFO["sorted_in"]="@ind_num_asc"} \
          $1==1848||$2==1848 {a[$3]=$0} END {j=0; for(i in a) \
           {print a[i]; j++; if(j==5) break}}' file.csv
Run Code Online (Sandbox Code Playgroud)

例子:

% cat file.txt
1848,2598,11.310694021273559
1848,2599,10.947275955606203
1848,2600,10.635270124233982
1848,2601,11.916564552040725
1848,2602,12.119810736845844
1848,2603,12.406661156256154
1848,2604,10.636275056472996
1848,2605,12.549890992708612
1848,2606,9.783802450936204
1848,2607,11.253697489670264
1848,2608,12.16385432290674
1848,2609,10.30355814063016
1848,2610,12.102525596913923
1848,2611,11.636595992818505
1848,2612,10.741178028606866
1848,2613,11.352414275107423
1848,2614,12.204860161717253
1848,2615,12.959915468475387
1848,2616,11.320652192610872

% awk -F, 'BEGIN{PROCINFO["sorted_in"]="@ind_num_asc"} $1==1848||$2==1848 {a[$3]=$0} END {for(i in a) print a[i]}' file.txt
1848,2606,9.783802450936204
1848,2609,10.30355814063016
1848,2600,10.635270124233982
1848,2604,10.636275056472996
1848,2612,10.741178028606866
1848,2599,10.947275955606203
1848,2607,11.253697489670264
1848,2598,11.310694021273559
1848,2616,11.320652192610872
1848,2613,11.352414275107423
1848,2611,11.636595992818505
1848,2601,11.916564552040725
1848,2610,12.102525596913923
1848,2602,12.119810736845844
1848,2608,12.16385432290674
1848,2614,12.204860161717253
1848,2603,12.406661156256154
1848,2605,12.549890992708612
1848,2615,12.959915468475387

% awk -F, 'BEGIN{PROCINFO["sorted_in"]="@ind_num_asc"} $1==1848||$2==1848 {a[$3]=$0} END {j=0; for(i in a) {print a[i]; j++; if(j==5) break}}' file.txt 
1848,2606,9.783802450936204
1848,2609,10.30355814063016
1848,2600,10.635270124233982
1848,2604,10.636275056472996
1848,2612,10.741178028606866
Run Code Online (Sandbox Code Playgroud)