R 3.3.2使用read.table导入CSV时出错 - 可能的引用问题

Sea*_*nge 1 csv r double-quotes read.table

我有一个很大的csv数据集,证明是一个很难导入R.

以下是数据集的示例,包含所有相关问题:

col 1,col 2,col 3,col 4
txt 1,txt ' 2,"This is a big

field with carriage returns, all enclosed in double

quotes",txt 4
txt1,txt2,txt3,txt4
Run Code Online (Sandbox Code Playgroud)

正如您所看到的,字段中的单引号存在问题,双引号括起包含逗号的大块文本,以及字段内的新行(所有这些都应该用双引号括起来).但是如果字段不包含逗号且没有新行,则字段没有双引号.

我试过了

read.table(file, sep = ",", quote = '"', header = TRUE)
Run Code Online (Sandbox Code Playgroud)

但我收到错误

Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  :
  line 31 did not have 95 elements
Run Code Online (Sandbox Code Playgroud)

不确定问题究竟是什么,但我确定它与条件双引号文本限定符,新行或两者相关.

有关调整代码或如何排除故障的建议吗?感谢任何帮助!

Fra*_*ank 7

使用来自data.table包的fread,它可以正常使用默认args:

DF = data.table::fread(data.table = FALSE, "col 1,col 2,col 3,col 4
txt 1,txt ' 2,\"This is a big

field with carriage returns, all enclosed in double

quotes\",txt 4
txt1,txt2,txt3,txt4")
Run Code Online (Sandbox Code Playgroud)

  col 1   col 2                                                                                          col 3 col 4
1 txt 1 txt ' 2 This is a big\n    \n    field with carriage returns, all enclosed in double\n    \n    quotes txt 4
2  txt1    txt2                                                                                           txt3  txt4
Run Code Online (Sandbox Code Playgroud)

我怀疑它可以通过传递给read.table的适当的args来完成,但是假设你可以安装data.table或其他一些能够更好地处理这个问题的软件包,这可能不值得.

  • 在多个线程中超过20个答案之后,我了解到fread很棒 (7认同)

42-*_*42- 5

我可以在这个玩具示例中做到,但我完全不相信这是正确的方法.我对真实世界CSV文件的体验是,经常有其他一些故障会破坏这些努力.

xs <- scan( what="", sep=",", quote="\"")
# then paste in your text:

1: col 1,col 2,col 3,col 4
5: txt 1,txt ' 2,"This is a big
5: 
5: field with carriage returns, all enclosed in double
5: 
5: quotes",txt 4
9: txt1,txt2,txt3,txt4
13: 
Read 12 items
Run Code Online (Sandbox Code Playgroud)

(赞成弗兰克的数据.成功.)

因为我read.table真的是这个scan函数的包装器,我尝试了这些设置,并最终理解我需要在第二行中转义内部单引号:

read.table( text='col 1,col 2,col 3,col 4
txt 1,txt \' 2,"This is a big

field with carriage returns, all enclosed in double

quotes",txt 4
txt1,txt2,txt3,txt4
', header=TRUE, sep=",", quote="\"")
Run Code Online (Sandbox Code Playgroud)