Sed和awk导致线环绕

Coc*_*dit 4 regex unix awk sed

我有一个表格的文件:

FA01_01:The birch canoe slid on the smooth planks  
FA01_02:Glue the sheet to the dark blue background
Run Code Online (Sandbox Code Playgroud)

我需要它的形式(也注意使用小写):

<s> the birch canoe slid on the smooth planks </s> (FA01_01)  
<s> glue the sheet to the dark blue background </s> (FA01_02)
Run Code Online (Sandbox Code Playgroud)

所以我用sed尝试了以下表达式:

sed 's/\(.......\):\(.*$\)/(\1) <s> \2 <\/s>/' tmp.dat
Run Code Online (Sandbox Code Playgroud)

但这是它返回的内容:

</s> (FA01_01)anoe slid on the smooth planks  
</s> (FA01_02)eet to the dark blue background
Run Code Online (Sandbox Code Playgroud)

无论出于何种原因,似乎sed导致被替换的模式环绕到行的开头但仅用于第二个匹配.例:

$> sed 's/\(.......\):\(.*$\)/\1 \2/' tmp.dat
FA01_01 The birch canoe slid on the smooth planks
Run Code Online (Sandbox Code Playgroud)

是的,但是

$>sed 's/\(.......\):\(.*$\)/\2 \1/' tmp.dat
FA01_01h canoe slid on the smooth planks
Run Code Online (Sandbox Code Playgroud)

这甚至也出现在awk中.为了测试环绕假设:

$> awk 'BEGIN{FS=":"}{print tolower($2) "XXX"}' tmp.dat
XXX birch canoe slid on the smooth planks
Run Code Online (Sandbox Code Playgroud)

$> awk 'BEGIN{FS=":"}{print tolower($1) "XXX"}' tmp.dat
fa01_01XXX
Run Code Online (Sandbox Code Playgroud)

什么会导致这个换行?是否与第二个模式或已保存的列一直到行尾的事实有关?

小智 5

原因是你的tmp.dat可能是DOS格式(带\ r字符).您可以尝试将其转换为linux格式(仅使用\n),例如使用以下命令:

dos2unix tmp.dat
Run Code Online (Sandbox Code Playgroud)

然后执行:

sed 's/\(.......\):\(.*$\)/<s>\L \2 \E<\/s> (\1)/' tmp.dat
Run Code Online (Sandbox Code Playgroud)