Ric*_*rta 18 crash r write.table
以下一致性导致我的R会话崩溃.
在两台机器上测试,Ubuntu和Mac OS X两者都有类似的结果.
简要说明:使用所有NA的因子列
调用write.tabledata.frame.
原始数据集相当大,我设法隔离了有问题的列,然后创建了一个类似的向量,命名PROBLEM_DATA如下,导致相同的崩溃.
有趣的是,有时R崩溃是彻头彻尾的,它只会引发以下错误:
Error in write.table(x, file, nrow(x), p, rnames, sep, eol, na, dec, as.integer(quote), :
'getCharCE' must be called on a CHARSXP
Run Code Online (Sandbox Code Playgroud)
违规数据和电话:
PROBLEM_DATA <- structure(114:116, .Label = c("String1", "String2", "String3", "String4", "String5", "String6",
"String7", "String8", "String9", "String10", "String11", "String12", "String13", "String14", "String15"), class = "factor")
# This will cause a crash
write.table(PROBLEM_DATA, file=path.expand("~/test.csv"))
# This will also crash
write.table(PROBLEM_DATA, file=path.expand("~/test.csv"), fileEncoding="UTF-8")
Run Code Online (Sandbox Code Playgroud)
R version 2.15.3 (2013-03-01)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=C LC_COLLATE=C
[5] LC_MONETARY=C LC_MESSAGES=C LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=C LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] gdata_2.12.0 ggplot2_0.9.3 stringr_0.6.1 RMySQL_0.9-3 DBI_0.2-5
[6] data.table_1.8.8
loaded via a namespace (and not attached):
[1] MASS_7.3-23 RColorBrewer_1.0-5 colorspace_1.2-0 dichromat_1.2-4
[5] digest_0.5.2 grid_2.15.3 gtable_0.1.1 gtools_2.7.0
[9] labeling_0.1 munsell_0.4 plyr_1.7.1 proto_0.3-9.2
[13] reshape2_1.2.1 scales_0.2.3 tools_2.15.3
Run Code Online (Sandbox Code Playgroud)
R version 2.15.3 (2013-03-01)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
Run Code Online (Sandbox Code Playgroud)
这是一个很好的可重现的bug,应该报告给R-devel或使用bug.report().FWIW上
> sessionInfo()
R version 3.0.0 Patched (2013-04-03 r62485)
Platform: x86_64-unknown-linux-gnu (64-bit)
Run Code Online (Sandbox Code Playgroud)
如果在Linux上我配置R与CFLAGS =" - g -O0"我可以
R -d gdb
(gdb) break Rf_error
(gdb) run
Run Code Online (Sandbox Code Playgroud)
然后粘贴你的线条,最后到达
> write.table(PROBLEM_DATA, file=path.expand("~/test.csv"))
Breakpoint 1, Rf_error (format=0x7ffff7a8f0f0 "'%s' must be called on a CHARSXP") at /home/mtmorgan/src/R-3-0-branch/src/main/errors.c:753
753 RCNTXT *c = R_GlobalContext;
(gdb) up 3
#3 0x00007ffff1b9bfb3 in EncodeElement2 (x=0x31ccf50, indx=113, quote=TRUE, qmethod=TRUE, buff=0x7fffffffbdc0, cdec=46 '.')
at /home/mtmorgan/src/R-3-0-branch/src/library/utils/src/io.c:938
938 p0 = translateChar(STRING_ELT(x, indx));
(gdb) call Rf_PrintValue(x)
[1] "String1" "String2" "String3" "String4" "String5" "String6"
[7] "String7" "String8" "String9" "String10" "String11" "String12"
[13] "String13" "String14" "String15"
(gdb) p indx
$1 = 113
Run Code Online (Sandbox Code Playgroud)
这表明R试图打印出因子名称的第114个元素 - 显然事情已经出错,因为该因子具有超出其水平长度的整数值.
不是答案,而是长评论:
PROBLEM_DATA <- structure(c(1:5,114:116), .Label = c("String1", "String2", "String3",'string4','str5','str6','str7'),class='factor')
Rgames> as.numeric(PROBLEM_DATA)
[1] 1 2 3 4 5 114 115 116
Rgames> as.numeric(as.character(PROBLEM_DATA))
[1] NA NA NA NA NA NA NA NA
Warning message:
NAs introduced by coercion
Rgames> levels(PROBLEM_DATA)
[1] "String1" "String2" "String3" "string4" "str5" "str6" "str7"
Rgames> write.table(PROBLEM_DATA, file=path.expand("~/ctest.csv"))
Error in write.table(x, file, nrow(x), p, rnames, sep, eol, na, dec, as.integer(quote), :
'getCharCE' must be called on a CHARSXP
Run Code Online (Sandbox Code Playgroud)
ctest.csv包含:(就 Excel 而言,每一行都是一个单元格)
x
1 "String1"
2 "String2"
3 "String3"
4 "string4"
5 "str5"
6
Run Code Online (Sandbox Code Playgroud)
因此,当级别的基础编号存在“差距”时,您可以看到事情变得很糟糕。希望这能为比我更了解因素的人提供线索。