从R写入UTF-8文件

Sve*_*rre 19 windows unicode r utf-8

虽然R似乎在内部很好地处理Unicode字符,但是我无法在R中输出具有这种UTF-8 Unicode字符的数据帧.有没有办法强迫这个?

data.frame(c("h?ersumian","?mettigan"))->test
write.table(test,"test.txt",row.names=F,col.names=F,quote=F,fileEncoding="UTF-8")
Run Code Online (Sandbox Code Playgroud)

输出文本文件如下:

hiersumian <U+01E3>mettigan

我在Windows环境(Windows 7)中使用R 3.0.2版.

编辑


在答案中已经建议R正确地以UTF-8编写文件,问题在于我用来查看文件的软件.这里有一些代码,我在R中做所有事情.我正在用UTF-8编码的文本文件中读取,并且R正确读取它.然后R将文件写入UTF-8并再次读回,现在正确的Unicode字符消失了.

read.table("myinputfile.txt",encoding="UTF-8")->myinputfile
myinputfile[1,1]
write.table(myinputfile,"myoutputfile.txt",row.names=F,col.names=F,quote=F,fileEncoding="UTF-8")
read.table("myoutputfile.txt",encoding="UTF-8")->myoutputfile
myoutputfile[1,1]
Run Code Online (Sandbox Code Playgroud)

控制台输出:

> read.table("myinputfile.txt",encoding="UTF-8")->myinputfile
> myinputfile[1,1]
[1] h?ersumian
Levels: h?ersumian ?mettigan
> write.table(myinputfile,"myoutputfile.txt",row.names=F,col.names=F,quote=F,fileEncoding="UTF-8")
> read.table("myoutputfile.txt",encoding="UTF-8")->myoutputfile
> myoutputfile[1,1]
[1] <U+FEFF>hiersumian
Levels: <U+01E3>mettigan <U+FEFF>hiersumian
> 
Run Code Online (Sandbox Code Playgroud)

Raf*_*ael 10

这个"答案"的目的是澄清幕后有些奇怪的事情:

"hīersumian"甚至没有把它变成数据框架.在所有情况下,"ī" - 符号都转换为"i".

options("encoding" = "native.enc")
t1 <- data.frame(a = c("h?ersumian "), stringsAsFactors=F)
t1
#             a
# 1 hiersumian 

options("encoding" = "UTF-8")
t1 <- data.frame(a = c("h?ersumian "), stringsAsFactors=F)
t1
#             a
# 1 hiersumian 

options("encoding" = "UTF-16")
t1 <- data.frame(a = c("h?ersumian "), stringsAsFactors=F)
t1
#             a
# 1 hiersumian 
Run Code Online (Sandbox Code Playgroud)

以下序列成功将"ǣmettigan"写入文本文件:

t2 <- data.frame(a = c("?mettigan"), stringsAsFactors=F)

getOption("encoding")
# [1] "native.enc"

Encoding(t2[,"a"]) <- "UTF-16"

write.table(t2,"test.txt",row.names=F,col.names=F,quote=F)
Run Code Online (Sandbox Code Playgroud)

在此输入图像描述

它不能用"编码"作为"UTF-8"或"UTF-16",并且指定"fileEncoding"将导致缺陷或没有输出.

有点令人失望,到目前为止,我设法以某种方式修复所有Unicode问题.