使非ASCII数据适合CRAN

Ric*_*ton 19 portability r cran

我有一些包含非ASCII字符的数据,我希望将其作为rdaR包中的文件包含在内.当我R CMD check在包装上运行时,我收到警告:

Warning: found non-ASCII strings
Run Code Online (Sandbox Code Playgroud)

这阻止了它在CRAN上被允许.

关于从数据文件中删除非ASCII字符有类似的问题,但我想保留非ASCII字符.

您可以在此处获取CSV数据.我正在将它读入R并重新保存,就像rda这段代码一样:

english_monarchs <- read.csv(
  wherever_you_downloaded_the_file_to, 
  fileEncoding     = "utf8",
  na.strings       = ""
)
save(english_monarchs, "english_monarchs.csv")
Run Code Online (Sandbox Code Playgroud)

name是包含非ascii值的数据集列.

head(levels(english_monarchs$name))
## [1] "Adda"                                "Æðelbehrt"                          
## [3] "Æðelberht I"                         "Æðelberht II and Eardwulf"          
## [5] "Æðelberht II, Ælfric and Eadberht I" "Æðelberht III"
Run Code Online (Sandbox Code Playgroud)

基于编写R扩展的编码问题部分中的(不是很清楚)指导,我认为我应该将因子级别编码为UTF-8,但显而易见的方法不起作用:

Encoding(levels(english_monarchs$name)) <- "utf8"  #each encoding still "unknown"
Run Code Online (Sandbox Code Playgroud)

如何使数据足够便携以便在CRAN上接受?

Ric*_*ton 13

对我"latin1"有用的iconv是将编码声明为,然后用于转换为UTF-8.

Encoding(levels(english_monarchs$name)) <- "latin1"
levels(english_monarchs$name) <- iconv(
  levels(english_monarchs$name), 
  "latin1", 
  "UTF-8"
)
Run Code Online (Sandbox Code Playgroud)