如何指导我的 awk 命令在第 2 列上工作

Question

如何指导我的 awk 命令在第 2 列上工作

我希望这个awk命令用制表符替换第 2 列的最后一个下划线。现在，它用制表符替换每行中的最后一个下划线，请注意，每行的列中可能有不同数量的下划线。我已经尝试了很多方法来指示命令仅在第 2 列上工作。我知道我已经很接近了，有人可以做最后的调整吗？

制表符分隔的示例文件：

OTU1 this_is_the_second_column 100 0 450 this_is_the_sixth_column 1 5 3.2
OTU2 this_is_another_column_to_parse 103 4 650 this_is_another_test_string_too 4 7 4.6

Run Code Online (Sandbox Code Playgroud)

它应该是什么样子：

OTU1 this_is_the_second column 100 0 450 this_is_the_sixth_column 1 5 3.2
OTU2 this_is_another_column_to parse 103 4 650 this_is_another_test_string_too 4 7 4.6

Run Code Online (Sandbox Code Playgroud)

这是我当前的代码：

gawk -F'\t' -v OFS='\t' 'BEGIN{FS=OFS="_"}{last=$NF;NF--;print $0"\t"last}' test1.tab > test1_reformat.tab

Run Code Online (Sandbox Code Playgroud)

任何帮助是极大的赞赏

谢谢

Answer 1

ste*_*ver 6

由于您似乎有 GNU awk，因此您可以使用其gensub函数来捕获下划线后的非下划线尾部序列，并在制表符后重新替换它：

gawk 'BEGIN {OFS=FS="\t"} {$2 = gensub(/_([^_]*)$/, "\t\\1", "1", $2)} 1' test1.tab

Run Code Online (Sandbox Code Playgroud)

或者（并且 - 我认为 - 可移植）使用该match函数进行一些字符串切片：

awk 'BEGIN{OFS=FS="\t"} match($2,/_[^_]*$/) {$2 = substr($2,1,RSTART-1) "\t" substr($2,RSTART+1)} 1' test1.tab

Run Code Online (Sandbox Code Playgroud)

归档时间：	2 年，4 月前
查看次数：	333 次
最近记录：	2 年，4 月前