Coc*_*dit 4 regex unix awk sed
我有一个表格的文件:
FA01_01:The birch canoe slid on the smooth planks
FA01_02:Glue the sheet to the dark blue background
Run Code Online (Sandbox Code Playgroud)
我需要它的形式(也注意使用小写):
<s> the birch canoe slid on the smooth planks </s> (FA01_01)
<s> glue the sheet to the dark blue background </s> (FA01_02)
Run Code Online (Sandbox Code Playgroud)
所以我用sed尝试了以下表达式:
sed 's/\(.......\):\(.*$\)/(\1) <s> \2 <\/s>/' tmp.dat
Run Code Online (Sandbox Code Playgroud)
但这是它返回的内容:
</s> (FA01_01)anoe slid on the smooth planks
</s> (FA01_02)eet to the dark blue background
Run Code Online (Sandbox Code Playgroud)
无论出于何种原因,似乎sed导致被替换的模式环绕到行的开头但仅用于第二个匹配.例:
$> sed 's/\(.......\):\(.*$\)/\1 \2/' tmp.dat
FA01_01 The birch canoe slid on the smooth planks
Run Code Online (Sandbox Code Playgroud)
是的,但是
$>sed 's/\(.......\):\(.*$\)/\2 \1/' tmp.dat
FA01_01h canoe slid on the smooth planks
Run Code Online (Sandbox Code Playgroud)
这甚至也出现在awk中.为了测试环绕假设:
$> awk 'BEGIN{FS=":"}{print tolower($2) "XXX"}' tmp.dat
XXX birch canoe slid on the smooth planks
Run Code Online (Sandbox Code Playgroud)
但
$> awk 'BEGIN{FS=":"}{print tolower($1) "XXX"}' tmp.dat
fa01_01XXX
Run Code Online (Sandbox Code Playgroud)
什么会导致这个换行?是否与第二个模式或已保存的列一直到行尾的事实有关?
小智 5
原因是你的tmp.dat可能是DOS格式(带\ r字符).您可以尝试将其转换为linux格式(仅使用\n),例如使用以下命令:
dos2unix tmp.dat
Run Code Online (Sandbox Code Playgroud)
然后执行:
sed 's/\(.......\):\(.*$\)/<s>\L \2 \E<\/s> (\1)/' tmp.dat
Run Code Online (Sandbox Code Playgroud)