原始标题"如何替换字符串"更新为"修复编码",因为这是这里回答的问题.
> sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] countrycode_0.17 dplyr_0.2
loaded via a namespace (and not attached):
[1] assertthat_0.1 magrittr_1.0.1 parallel_3.1.1 Rcpp_0.11.3 tools_3.1.1
Run Code Online (Sandbox Code Playgroud)
从Web源导入期间,我有一个包含某些错误的数据框.我正在寻找那些我认为正确的字符串替换,我正在学习R和dplyr所以知道如何做到这一点可能会帮助我解决更大的清理数据问题.
请在第20和第31行找到带错误的图像,我们看到"Universitat"而不是"Universitat"和"LinkÃping"而不是"Linkaping"
我知道我可以查找行和列并尝试替换它们,但是如果数据帧或数据集较大,我将无法找到所有实例.
因为我知道差异,我想要搜索单词本身并替换它.只是这个词.我知道它是字符串的一部分.但是我仍然只想处理字符串的那一部分,我可以这样做吗?
任何帮助表示赞赏,如果你能为这个问题所有可能的方法和解决方案既简单又复杂,我也将不胜感激,这将有助于我更快地学习,谢谢,
name country
1 TU Dortmund Germany
2 TU Dortmund Germany
3 Maastricht University Netherlands
4 University of the Fraser Valley Canada
5 Queen's University Canada
6 Aarhus University Denmark
7 University Of Alberta Canada
8 Deakin University Australia
9 Macquarie University Australia
10 National University Of Ireland, Galway Ireland
11 Vienna University Austria
12 National University of Singapore Singapore
13 Erasmus University Netherlands
14 Radboud Universiteit Nijmegen Netherlands
15 Vrije Universiteit Amsterdam Netherlands
16 University of Otago New Zealand
17 National College Of Ireland Ireland
18 University College Cork Ireland
19 Irish Management Institute Ireland
20 Universität Konstanz Germany
21 Otto Von Guericke University Magdeburg Germany
22 University of Technology Sydney Australia
23 Dublin City University Ireland
24 Institute Of Technology Blanchardstown Ireland
25 Kth Royal Institute Of Technology Sweden
26 Aalto University Finland
27 Dalarna University Sweden
28 University Of Helsinki Finland
29 Aarhus University Denmark
30 University College Dublin Ireland
31 Linköping University Sweden
32 Aalborg University Denmark
33 Dublin Institute Of Technology Ireland
34 York University Canada
35 Maastricht University Netherlands
36 Utrecht University Netherlands
Run Code Online (Sandbox Code Playgroud)
你可以通过几种方式纠正这个问题.
使用正确的编码(UTF-8)读取文件
read.csv2(file("filename.csv", encoding="UTF-8"))
Run Code Online (Sandbox Code Playgroud)读取文件后,应用函数转换为UTF-8编码
library(stringi)
df[] <- lapply(df, function(x) stri_encode(x, "", "UTF-8"))
Run Code Online (Sandbox Code Playgroud)