use*_*612 4 csv shell awk replace
我收到了一个CSV文件,其中字段分隔符是管道分类(即|).此文件具有预定义数量的字段(例如N).我可以N通过读取CSV文件的标题来发现它的价值,我们可以认为这是正确的.
某些字段错误地包含换行符,这使得该行看起来比所需的短(即,它具有M字段,带M < N).
我需要创建的是一个sh脚本(不是bash)来修复这些行.
我尝试创建以下脚本来尝试修复该文件:
if [ $# -ne 1 ]
then
echo "Usage: $0 <filename>"
exit
fi
# get first line
first_line=$(head -n 1 $1)
# get number of fields
num_separators=$(echo "$first_line" | tr -d -c '|' | awk '{print length}')
cat $1 | awk -v numFields=$(( num_separators + 1 )) -F '|' '
{
totRecords = NF/numFields
# loop over lines
for (record=0; record < totRecords; record++) {
output = ""
# loop over fields
for (i=0; i<numFields; i++) {
j = (numFields*record)+i+1
# replace newline with question mark
sub("\n", "?", $j)
output = output (i > 0 ? "|" : "") $j
}
print output
}
}
'
Run Code Online (Sandbox Code Playgroud)
但是,换行符仍然存在.我该如何解决这个问题?
FIRST_NAME|LAST_NAME|NOTES
John|Smith|This is a field with a
newline
Foo|Bar|Baz
Run Code Online (Sandbox Code Playgroud)
FIRST_NAME|LAST_NAME|NOTES
John|Smith|This is a field with a * newline
Foo|Bar|Baz
* I don't care about the replacement, it could be a space, a question mark, whatever except a newline or a pipe (which would create a new field)
Run Code Online (Sandbox Code Playgroud)
$ cat tst.awk
BEGIN { FS=OFS="|" }
NR==1 { reqdNF = NF; printf "%s", $0; next }
{ printf "%s%s", (NF < reqdNF ? " " : ORS), $0 }
END { print "" }
$ awk -f tst.awk file.csv
FIRST_NAME|LAST_NAME|NOTES
John|Smith|This is a field with a newline
Foo|Bar|Baz
Run Code Online (Sandbox Code Playgroud)
如果那不是您想要的,那么编辑您的问题以提供更真实的代表性样本输入和相关输出.
| 归档时间: |
|
| 查看次数: |
822 次 |
| 最近记录: |