s.k*_*s.k 3 csv columns text-formatting
使用sed或awk,是否可以直观地对齐 CSV 文件中的列?
例如:
例如来自:
a,b,c,some stuff,"some, other, stuff",d,2023-03-10 18:37:00
y,x,z,t,cool,thing,2022-04-12 21:44:00
Run Code Online (Sandbox Code Playgroud)
到:
a, b, c, some stuff,"some, other, stuff", d, 2023-03-10 18:37:00<EOL>
x, y, z, t, cool, thing, 2022-04-12 21:44:00<EOL>
Run Code Online (Sandbox Code Playgroud)
有一些包含文本并带有逗号的双引号字段。
我尝试column过bsdmainutils,但它显然无法处理此类数据。
这种类型的 CSV 文件:
a, b, c, some stuff,"some, other, stuff", d, 2023-03-10 18:37:00<EOL>
x, y, z, t, cool, thing, 2022-04-12 21:44:00<EOL>
Run Code Online (Sandbox Code Playgroud)
由于您正在修改字段,因此不再是真正相同的数据文件。解析时,最初的内容现在将由于上面的宽度"t"而被解析(除非您使用正则表达式来解析非标准分隔符。)" t""some stuff",[variable space]
您可以对所有字段强制加引号,以获得更清晰地显示这些新字段的 csv 文件。这是一个 Ruby 来做到这一点:
ruby -r csv -e '
cols={}
data=CSV.parse($<.read)
data.transpose.each_with_index{|sa,i|
cols[i]=sa.max_by{|e| e.length}; cols[i]=cols[i].length
}
puts CSV.generate(force_quotes:true){|csv|
data.each{|row|
csv<<row.map.with_index{|e, i| e.rjust(cols[i] ) }
}
}
' file
Run Code Online (Sandbox Code Playgroud)
印刷:
"a","b","c","some stuff","some, other, stuff"," d","2023-03-10 18:37:00"
"y","x","z"," t"," cool","thing","2022-04-12 21:44:00"
Run Code Online (Sandbox Code Playgroud)
或者,如果您确实想要带引号和不带引号的字段,您可以这样做:
ruby -r csv -e '
lcl_csv_opt={:row_sep=>nil}
data=CSV.parse($<.read)
cols=data.transpose.map.with_index{|sa,i|
x=sa.max_by{|e| [e].to_csv(**lcl_csv_opt).length}
[i,"#{[x].to_csv(**lcl_csv_opt)}"]
}.to_h
puts CSV.generate(){|csv|
data.each{|row|
csv<<row.map.with_index{|e, i|
[e].to_csv(**lcl_csv_opt)==cols[i] ? e : e.rjust(cols[i].length )
}
}
}
' file
Run Code Online (Sandbox Code Playgroud)
印刷:
a,b,c,some stuff,"some, other, stuff", d,2023-03-10 18:37:00
y,x,z, t, cool,thing,2022-04-12 21:44:00
Run Code Online (Sandbox Code Playgroud)
它还可以处理字段中令人讨厌的转义引号。鉴于:
$ cat file
a,b,c,some stuff,"some, other, stuff",d,2023-03-10 18:37:00
y,x,z,t,cool,"""thing"", quoted",2022-04-12 21:44:00
Run Code Online (Sandbox Code Playgroud)
第二个版本打印:
a,b,c,some stuff,"some, other, stuff", d,2023-03-10 18:37:00
y,x,z, t, cool,"""thing"", quoted",2022-04-12 21:44:00
Run Code Online (Sandbox Code Playgroud)
有一些包含文本并带有逗号的双引号字段。
那么就忘记简单的文本解析吧。只需获取一些可以解析复杂 CSV 的东西,然后让它进行漂亮的打印即可。
米勒是首选工具。您可以指定“漂亮打印”作为输出格式:
mlr --icsv --opprint cat example.csv
Run Code Online (Sandbox Code Playgroud)
你也可以只使用Python的内置csv模块:
import csv
rows = []
maxwidths = []
with open("foo.csv") as csvfile:
reader = csv.reader(csvfile, delimiter=",", quotechar='"')
for row in reader:
for column_idx, entry in enumerate(row):
if column_idx >= len(maxwidths):
maxwidths += [len(entry)]
else:
maxwidths[column_idx] = max(maxwidths[column_idx], len(entry))
rows += [row]
for row in rows:
print(", ".join([f"{col:<{width}}" for col, width in zip(row, maxwidths)]))
Run Code Online (Sandbox Code Playgroud)