如何读取"在R中使用read.table双引号转义的值

Ale*_*ker 12 csv r escaping double-quotes

我无法在R中读取包含下面一行的文件.

__CODE__

任何的想法?我怎样才能使read.table明白"是引用的逃脱?

干杯,亚历山大

Tom*_*mmy 6

在我看来,read.table/read.csv 无法处理转义的报价.

......但我认为我有一个(丑陋的)解决方案受到@nullglob的启发;

  • 首先读取没有引号字符的文件.(这不会处理嵌入式,@Ben Bolker指出)
  • 然后通过字符串列并删除引号:

测试文件看起来像这样(为了测量,我添加了一个非字符串列):

13,"foo","Fab D\"atri","bar"
21,"foo2","Fab D\"atri2","bar2"
Run Code Online (Sandbox Code Playgroud)

以下是代码:

# Generate test file
writeLines(c("13,\"foo\",\"Fab D\\\"atri\",\"bar\"",
             "21,\"foo2\",\"Fab D\\\"atri2\",\"bar2\"" ), "foo.txt")

# Read ignoring quotes
tbl <- read.table("foo.txt", as.is=TRUE, quote='', sep=',', header=FALSE, row.names=NULL)

# Go through and cleanup    
for (i in seq_len(NCOL(tbl))) {
    if (is.character(tbl[[i]])) {
        x <- tbl[[i]]
        x <- substr(x, 2, nchar(x)-1) # Remove surrounding quotes
        tbl[[i]] <- gsub('\\\\"', '"', x) # Unescape quotes
    }
}
Run Code Online (Sandbox Code Playgroud)

输出正确:

> tbl
  V1   V2          V3   V4
1 13  foo  Fab D"atri  bar
2 21 foo2 Fab D"atri2 bar2
Run Code Online (Sandbox Code Playgroud)