我尝试以下代码
j <- "*Politics:* Disgraced peer Jeffrey Archer is set to make \xa31m from his Belmarsh "
nchar(j)
# Error in nchar(j) : invalid multibyte string 1
Run Code Online (Sandbox Code Playgroud)
正如你所看到的,我无法使用nchar().我该如何解决这个问题?
如果您知道可以使用的特定编码 iconv来转换为更好的工作
j <- "*Politics:* Disgraced peer Jeffrey Archer is set to make \xa31m from his Belmarsh "
iconv(j, "ISO-8859-1", "UTF-8")
#[1] "*Politics:* Disgraced peer Jeffrey Archer is set to make £1m from his Belmarsh "
nchar(iconv(j, "ISO-8859-1", "UTF-8"))
#[1] 79
Run Code Online (Sandbox Code Playgroud)
我将您的文本写入文件并使用geany检查编码,这是我到达ISO-8859-1的方式.
不需要您计算编码的替代路线是使用type="bytes"而不是手动转换为UTF-8
nchar(j, type = "bytes")
#[1] 79
Run Code Online (Sandbox Code Playgroud)
我建议在nchar上读取帮助文件,?nchar因为默认类型和type ="bytes"之间存在细微差别.