Vic*_*cky 5 sed awk text-processing
我有一个包含 150 多列的 CSV 文件,以换行符作为记录分隔符。问题在于其中一列获得换行符。为此,我想删除它们。
输入:
001|Baker St.
London|3|4|7
002|Penny Lane
Liverpool|88|5|7
Run Code Online (Sandbox Code Playgroud)
输出:
001|Baker St. London|3|4|7
002|Penny Lane Liverpool|88|5|7
Run Code Online (Sandbox Code Playgroud)
sed
只要当前行不包含 4 个|
字符,您就可以将下一行合并到当前行中:
<file sed -e :1 -e 's/|/|/4;t' -e 'N;s/\n/ /;b1'
Run Code Online (Sandbox Code Playgroud)
某些sed
实现具有-i
或-i ''
就地编辑文件(-i.back
以使用.back
扩展名保存原始文件),因此对于这些实现,您可以执行以下操作:
sed -i -e :1 -e 's/|/|/4;t' -e 'N;s/\n/ /;b1' ./*.csv
Run Code Online (Sandbox Code Playgroud)
编辑csv
当前目录中的所有非隐藏文件。
与评论相同:
<file sed '
:1
s/|/|/4; # replace the 4th | with itself. Only useful when combined with
# the next "t" command which branches off if the previous
# substitution was successful
t
# we only reach this point if "t" above did not branch off, that is
# if the pattern space does not contain 4 "|"s
N; # append the next line to the pattern space
s/\n/ /; # replace the newline with a space
# and then loop again in case the pattern space still does not contain
# 4 "|"s:
b1'
Run Code Online (Sandbox Code Playgroud)
依赖于第一个字段的格式(假设每行应以数字开头):
awk 'NR == 1{ printf $0; next }
{ printf "%s%s", (/^[0-9]+/? ORS : ""), $0 }
END{ print "" }' file.csv
Run Code Online (Sandbox Code Playgroud)
输出:
001|Baker St.London|3|4|7
002|Penny LaneLiverpool|88|5|7
Run Code Online (Sandbox Code Playgroud)