Sea*_*nge 1 csv r double-quotes read.table
我有一个很大的csv数据集,证明是一个很难导入R.
以下是数据集的示例,包含所有相关问题:
col 1,col 2,col 3,col 4
txt 1,txt ' 2,"This is a big
field with carriage returns, all enclosed in double
quotes",txt 4
txt1,txt2,txt3,txt4
Run Code Online (Sandbox Code Playgroud)
正如您所看到的,字段中的单引号存在问题,双引号括起包含逗号的大块文本,以及字段内的新行(所有这些都应该用双引号括起来).但是如果字段不包含逗号且没有新行,则字段没有双引号.
我试过了
read.table(file, sep = ",", quote = '"', header = TRUE)
Run Code Online (Sandbox Code Playgroud)
但我收到错误
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
line 31 did not have 95 elements
Run Code Online (Sandbox Code Playgroud)
不确定问题究竟是什么,但我确定它与条件双引号文本限定符,新行或两者相关.
有关调整代码或如何排除故障的建议吗?感谢任何帮助!
使用来自data.table包的fread,它可以正常使用默认args:
DF = data.table::fread(data.table = FALSE, "col 1,col 2,col 3,col 4
txt 1,txt ' 2,\"This is a big
field with carriage returns, all enclosed in double
quotes\",txt 4
txt1,txt2,txt3,txt4")
Run Code Online (Sandbox Code Playgroud)
给
col 1 col 2 col 3 col 4
1 txt 1 txt ' 2 This is a big\n \n field with carriage returns, all enclosed in double\n \n quotes txt 4
2 txt1 txt2 txt3 txt4
Run Code Online (Sandbox Code Playgroud)
我怀疑它可以通过传递给read.table的适当的args来完成,但是假设你可以安装data.table或其他一些能够更好地处理这个问题的软件包,这可能不值得.
我可以在这个玩具示例中做到,但我完全不相信这是正确的方法.我对真实世界CSV文件的体验是,经常有其他一些故障会破坏这些努力.
xs <- scan( what="", sep=",", quote="\"")
# then paste in your text:
1: col 1,col 2,col 3,col 4
5: txt 1,txt ' 2,"This is a big
5:
5: field with carriage returns, all enclosed in double
5:
5: quotes",txt 4
9: txt1,txt2,txt3,txt4
13:
Read 12 items
Run Code Online (Sandbox Code Playgroud)
(赞成弗兰克的数据.成功.)
因为我read.table真的是这个scan函数的包装器,我尝试了这些设置,并最终理解我需要在第二行中转义内部单引号:
read.table( text='col 1,col 2,col 3,col 4
txt 1,txt \' 2,"This is a big
field with carriage returns, all enclosed in double
quotes",txt 4
txt1,txt2,txt3,txt4
', header=TRUE, sep=",", quote="\"")
Run Code Online (Sandbox Code Playgroud)