tak*_*a15 4 text-processing csv
我有一个转换为 csv 的 excel 文件。转换后,它看起来像以下示例(请注意,csv 中有 100 多列。这是缩小版本):
,Product," ",Citty," ",Price
,Name," ",Location," ",Per Unit
,banana," ",CA," ",5.7
,apple," ",FL," ",2.3
Run Code Online (Sandbox Code Playgroud)
我需要编写一个脚本,该脚本将根据逗号位置将第一行和第二行“合并”在一起:
,Product Name," "" ",Citty Location," "" ",Price Per Unit
,banana," ",CA," ",5.7
,apple," ",FL," ",2.3
Run Code Online (Sandbox Code Playgroud)
我在这里查看了其他问题和堆栈溢出,但答案似乎与文件前两行的这种奇怪的逐列情况无关。
作为一个额外的不相关的任务,我还想摆脱 csv 中的空列并修复拼写错误,使其看起来像这样:
Product Name,City Location,Price Per Unit
banana,CA,5.7
apple,FL,2.3
Run Code Online (Sandbox Code Playgroud)
(csv 目前有一个选项卡,在每个实际数据列之间用引号括起来,第一列除外,第一列是空的,后跟一个逗号)。
我将多次收到带有拼写错误的 csv,因此我想以编程方式修复脚本中的错误。另请注意,列可能并不总是按照上面显示的顺序排列,因此我需要在脚本期间动态检查每个列名是否有错误。
尝试这个
$ awk -F, 'NR<2{split(gensub(/Citty/,"City","g",$0),a,FS)}NR==2{for(b=2;b<=NF;b+=2){c=c a[b]" "$b","}print gensub(/,$/,"",1,c)}NR>2{print gensub(/(^,|" *",)/,"","g",$0)}' inp
Product Name,City Location,Price Per Unit
banana,CA,5.7
apple,FL,2.3
$
Run Code Online (Sandbox Code Playgroud)
如果分成几行,相同的代码更具可读性:
$ awk -F, '
> NR<2{split(gensub(/Citty/,"City","g",$0),a,FS)}
> NR==2{for(b=2;b<=NF;b+=2){c=c a[b]" "$b","}print gensub(/,$/,"",1,c)}
> NR>2{print gensub(/(^,|" *",)/,"","g",$0)}' inp
Product Name,City Location,Price Per Unit
banana,CA,5.7
apple,FL,2.3
$
Run Code Online (Sandbox Code Playgroud)
如果是第一行,则将该行拆分为 a 内的数组元素。修复城市->城市拼写错误。
如果是第 2 行,从第 2 列开始,将第 1 行的相应列与此列一起打印。对每一列重复,以 2 列为增量。去除尾随,
.
在第二行之后,用空字符串替换任何前导,
或任何"<spaces>",
,然后打印结果。
在 GNU Awk 4.0.2 上测试正常