sti*_*tiq 4 text-processing csv
我有错误的 csv 文件,需要添加一些引号
在
field,field2,text field with potential commas,field4,field5
field,field2,text fie,ld with pot,ential commas,field4,field5
field,field2,text field with, potential commas,field4,field5
Run Code Online (Sandbox Code Playgroud)
出去
field,field2,"text field with potential commas",field4,field5
field,field2,"text fie,ld with pot,ential commas",field4,field5
field,field2,"text field with, potential commas",field4,field5
Run Code Online (Sandbox Code Playgroud)
sed 's/,/,"/2'
将添加第一个引号,但是对于每一行,如何从末尾向后第二次出现?
欢迎使用 sed、awk、perl 等方法。文件有几百万行,速度值得赞赏。
这是一种awk
方法:如果有超过五个逗号分隔的字段,则在打印由引号包围的新字段之前循环连接它们的“中间”字段,然后是最后两个字段:
awk -f awkscript.awk < input
Run Code Online (Sandbox Code Playgroud)
如下所示awkscript.awk
:
BEGIN {
OFS=","
FS=","
}
{
if (NF > 5) {
middle=""
for(i=3; i <= NF-2; i++)
middle=(middle ? middle"," : "")$i
print $1, $2, "\""middle"\"", $(NF-1), $NF
} else {
print $1, $2, "\""$3"\"", $4, $5
}
}
Run Code Online (Sandbox Code Playgroud)