如何逐列合并csv的前两行?

tak*_*a15 4 text-processing csv

我有一个转换为 csv 的 excel 文件。转换后,它看起来像以下示例(请注意,csv 中有 100 多列。这是缩小版本):

,Product,"  ",Citty,"   ",Price
,Name," ",Location,"    ",Per Unit
,banana,"   ",CA,"  ",5.7
,apple,"    ",FL,"  ",2.3
Run Code Online (Sandbox Code Playgroud)

我需要编写一个脚本,该脚本将根据逗号位置将第一行和第二行“合并”在一起:

,Product Name," ""  ",Citty Location,"  ""  ",Price Per Unit
,banana,"   ",CA,"  ",5.7
,apple,"    ",FL,"  ",2.3
Run Code Online (Sandbox Code Playgroud)

我在这里查看了其他问题和堆栈溢出,但答案似乎与文件前两行的这种奇怪的逐列情况无关。


作为一个额外的不相关的任务,我还想摆脱 csv 中的空列并修复拼写错误,使其看起来像这样:

Product Name,City Location,Price Per Unit
banana,CA,5.7
apple,FL,2.3
Run Code Online (Sandbox Code Playgroud)

(csv 目前有一个选项卡,在每个实际数据列之间用引号括起来,第一列除外,第一列是空的,后跟一个逗号)。

我将多次收到带有拼写错误的 csv,因此我想以编程方式修复脚本中的错误。另请注意,列可能并不总是按照上面显示的顺序排列,因此我需要在脚本期间动态检查每个列名是否有错误。

ste*_*eve 5

尝试这个

$ awk -F, 'NR<2{split(gensub(/Citty/,"City","g",$0),a,FS)}NR==2{for(b=2;b<=NF;b+=2){c=c a[b]" "$b","}print gensub(/,$/,"",1,c)}NR>2{print gensub(/(^,|" *",)/,"","g",$0)}' inp
Product Name,City Location,Price Per Unit
banana,CA,5.7
apple,FL,2.3
$
Run Code Online (Sandbox Code Playgroud)

如果分成几行,相同的代码更具可读性:

$ awk -F, '
> NR<2{split(gensub(/Citty/,"City","g",$0),a,FS)}
> NR==2{for(b=2;b<=NF;b+=2){c=c a[b]" "$b","}print gensub(/,$/,"",1,c)}
> NR>2{print gensub(/(^,|" *",)/,"","g",$0)}' inp
Product Name,City Location,Price Per Unit
banana,CA,5.7
apple,FL,2.3
$
Run Code Online (Sandbox Code Playgroud)

如果是第一行,则将该行拆分为 a 内的数组元素。修复城市->城市拼写错误。

如果是第 2 行,从第 2 列开始,将第 1 行的相应列与此列一起打印。对每一列重复,以 2 列为增量。去除尾随,.

在第二行之后,用空字符串替换任何前导,或任何"<spaces>",,然后打印结果。

在 GNU Awk 4.0.2 上测试正常

在线试试吧!