Bash：通过匹配数字范围来过滤行

Question

Bash：通过匹配数字范围来过滤行

rmf*_*rmf 5 bash sed awk text-processing

我有一个包含字段的文件测试：cato和pos。

Run Code Online (Sandbox Code Playgroud)

我有一个包含字段的文件db：cato、start和stop。

1   6408    8000
1   11822   16373
1   18716   23389
1   27690   34330
1   36552   39191
1   39313   44565
2   44839   50247
2   60987   65017
2   65705   71523

Run Code Online (Sandbox Code Playgroud)

我的目标是在文件db中选择行，其中pos文件test 的字段落在文件db 的开始和停止范围内。存在匹配必须在cato组内发生的限制。这两个文件都按字段 1 和 2 排序。顺便提一下，我的两个真实文件也有许多其他字段。

使用这个示例数据集，我的预期结果是：

1 6408 8000 2 44839 50247
Run Code Online (Sandbox Code Playgroud)
我有一个脚本，我已经混搭了。

k=1; data_test=$(cat "test") data_db=$(cat "db") while read -r line do # helps to keep count of test rows printf "$k \n" # get cato cato=$(echo $line | awk '{print $1}') # get pos pos=$(echo $line | awk '{print $2}') # get number of chars in pos (to reduce number of lines awk needs to look through later) pos_chr=$(echo -n $pos | wc -c) # get lines in db that start with cato and pos chars match start or stop matched=$(echo "$data_db" | grep -Ew "^$cato" | grep -Ew "[0-9]{$pos_chr}") #echo "$db_cat" # if matched is not empty if [ ! -z "$matched" ]; then # use awk to print lines in db where pos > start and pos < stop echo "$matched" | awk -v apos='$pos' 'BEGIN{OFS="\t"}{if(apos >= $2 && apos <= $3) print $0}' #check #echo "$matched" | awk -v apos=$pos 'BEGIN{OFS="\t"}{print apos,$0}' fi ((k=k+1)) done <<< "$data_test"
Run Code Online (Sandbox Code Playgroud)
好像awk没有在最后一步做比较。事情似乎一直工作到最后一步，然后我不确定出了什么问题。也许有人看到了错误。有一个更好的方法吗？

Answer 1

Rom*_*est 4

使用单个 GNUawk程序（自Gawkv4.0 起）：

awk 'NR==FNR{ a[$1][$2]; next }
     $1 in a{ 
         for (i in a[$1]) 
             if (i >= $2 && i <= $3) { print $0; break }
     }' test db

Run Code Online (Sandbox Code Playgroud)

输出：

1   6408    8000
2   44839   50247

Run Code Online (Sandbox Code Playgroud)

归档时间：	7 年，8 月前
查看次数：	802 次
最近记录：	6 年，5 月前