Fio*_*ang 1 r character-encoding non-english
我有一个包含多种语言的文本文件,如何在 R 使用read.delim函数中读取,
Encoding("file.tsv")\n#[1] "unknown"\n\nsource_data = read.delim(file, header= F, fileEncoding= "windows-1252",\n sep = "\\t", quote = "")\nsource_D[360]\n#[1] "\xc3\xb0\xc2\xbf\xc3\xb0\xc2\xbe\xc3\xb0\xc2\xb8\xc3\xb1\xc3\xb0\xc2\xba \xc3\xb0\xc2\xbd\xc3\xb0\xc2\xb0 \xc3\xb1\xc3\xb1\xe2\x80\x9a\xc3\xb0\xc2\xbe\xc3\xb0\xc2\xbc \xc3\xb1\xc3\xb0\xc2\xb0\xc3\xb0\xc2\xb9\xc3\xb1\xe2\x80\x9a\xc3\xb0\xc2\xb5"\nRun Code Online (Sandbox Code Playgroud)\n\n但source_D[360]记事本中显示的是 '\xd0\xbf\xd0\xbe\xd0\xb8\xd1\x81\xd0\xba \xd0\xbd\xd0\xb0 \xd1\x8d\xd1\x82\xd0\xbe\xd0 \xbc \xd1\x81\xd0\xb0\xd0\xb9\xd1\x82\xd0\xb5\'
小智 5
整洁宇宙方法:
使用 read_delim 中的选项locale。(readr 函数使用 _ 而不是 .,通常读取起来更快、更智能)更多详细信息请参见:https ://r4ds.had.co.nz/data-import.html#parsing-a-vector
source_data = read_delim(file, header= F,
locale = locale(encoding = "windows-1252"),
sep = "\t", quote = "")
Run Code Online (Sandbox Code Playgroud)