MCS*_*MCS 12 csv bash awk gawk
你如何使用gawk解析CSV文件?简单设置FS=","是不够的,因为带有逗号的引用字段将被视为多个字段.
使用的示例FS=","不起作用:
文件内容:
one,two,"three, four",five
"six, seven",eight,"nine"
Run Code Online (Sandbox Code Playgroud)
gawk脚本:
BEGIN { FS="," }
{
for (i=1; i<=NF; i++) printf "field #%d: %s\n", i, $(i)
printf "---------------------------\n"
}
Run Code Online (Sandbox Code Playgroud)
输出不好:
field #1: one
field #2: two
field #3: "three
field #4: four"
field #5: five
---------------------------
field #1: "six
field #2: seven"
field #3: eight
field #4: "nine"
---------------------------
Run Code Online (Sandbox Code Playgroud)
期望的输出:
field #1: one
field #2: two
field #3: "three, four"
field #4: five
---------------------------
field #1: "six, seven"
field #2: eight
field #3: "nine"
---------------------------
Run Code Online (Sandbox Code Playgroud)
Jon*_*ler 11
简短的回答是"如果CSV包含笨拙的数据,我不会使用gawk解析CSV",其中"尴尬"意味着CSV字段数据中的逗号等内容.
接下来的问题是"你将要做什么其他处理",因为这将影响你使用的替代品.
我可能会使用Perl和Text :: CSV或Text :: CSV_XS模块来读取和处理数据.请记住,Perl的原文为部分作为一个awk和sed杀手-因此a2p和s2p程序仍然分布用Perl其将awk和sed脚本(分别)转换为Perl.