R 中使用 iconv 函数音译德语单词

man*_*nro 3 locale r diacritics transliteration iconv

我试图使用iconvR 中的函数来实现德语单词的正确音译(例如,M\xc3\xb6bel \xe2\x86\x92 Moebel)。

\n

我编写了以下代码(尝试使用英语/德语语言环境):

\n
iconv("M\xc3\xb6bel", "latin1", "ASCII//TRANSLIT")\n[1] "Mobel"\n\niconv("M\xc3\xb6bel", "UTF-8", "ASCII//TRANSLIT")\n[1] NA\n\niconv("M\xc3\xb6bel", "UTF-8", "ASCII//TRANSLIT", sub ="")\n[1] "Mbel"\n\niconv("M\xc3\xb6bel", "Windows-1252", "ASCII//TRANSLIT")\n[1] "Mobel"\n
Run Code Online (Sandbox Code Playgroud)\n

然而,这并不能正常工作。这是我的一些测试的输出:

\n
#cat + library(ds4psy)\niconv(cat ("M", Umlaut["o"],"bel", sep = ""), "latin1", "ASCII//TRANSLIT")\nM\xc3\xb6belcharacter(0)\n
Run Code Online (Sandbox Code Playgroud)\n\n
#paste/paste0 + library(ds4psy)\n> iconv(paste ("M", Umlaut["o"],"bel", sep = ""), "latin1", "ASCII//TRANSLIT")\n[1] "MA?bel"\n
Run Code Online (Sandbox Code Playgroud)\n

为了完整起见,我还尝试了以下stri_trans_general函数stringi

\n
stri_trans_general("M\xc3\xb6bel", "latin-ascii")\n[1] "Mobel"\n
Run Code Online (Sandbox Code Playgroud)\n

但是,正如您所看到的,这也不起作用。

\n

我不明白的是,为什么该函数在PHPiconv中显然可以正常工作,但在 R 中却无法正常工作:

\n
<?php\n    //some German\n    $utf8_sentence = \'Wei\xc3\x9f, Goldmann, G\xc3\xb6bel, Weiss, G\xc3\xb6the, Goethe und G\xc3\xb6tz\';\n    setlocale(LC_ALL, \'de_DE\');\n    \n    $trans_sentence = iconv(\'UTF-8\', \'ASCII//TRANSLIT\', $utf8_sentence);\n    \n    //gives [Weiss, Goldmann, Goebel, Weiss, Goethe, Goethe und Goetz]\n    echo $trans_sentence . PHP_EOL;\n?>\n
Run Code Online (Sandbox Code Playgroud)\n

iconv为什么我会看到R 版本与 PHP 版本在行为上存在差异?我的 R 代码做错了什么?

\n

Chr*_*ann 5

如果您没有必要使用iconv,还有另一种方法可以实现您的目标。

\n

您可以定义一组要音译的德语字符及其一组替换字符,并将这些字符对用作 的输入str_replace_all

\n

数据:

\n
gg <- c("G\xc3\xb6the", "ger\xc3\xa4dert", "H\xc3\xbcrde", "wei\xc3\x9f")\n
Run Code Online (Sandbox Code Playgroud)\n

首先,定义你的集合:

\n
set <- setNames(c("oe", "ae", "ue", "ss"),\n                c("\xc3\xb6", "\xc3\xa4", "\xc3\xbc", "\xc3\x9f"))\n
Run Code Online (Sandbox Code Playgroud)\n

然后替换:

\n
library(stringr)\nstr_replace_all(gg, set)\n[1] "Goethe"    "geraedert" "Huerde"    "weiss" \n
Run Code Online (Sandbox Code Playgroud)\n