按字母顺序排序时,字母"y"出现在"i"之后

zez*_*ere 45 locale r alphabetical-sort

当使用函数时sort(x),x字符在哪里,字母"y"跳到中间,紧跟在字母"i"之后:

> letters
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" "t"
[21] "u" "v" "w" "x" "y" "z"

> sort(letters)
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "y" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
[21] "t" "u" "v" "w" "x" "z"
Run Code Online (Sandbox Code Playgroud)

原因可能是我位于立陶宛,这是"立陶宛式"字母排序,但我需要正常排序.如何在R代码中将排序方法更改回正常状态?

我在Win7上使用R 2.15.2.

Rei*_*son 39

您需要更改R正在运行的语言环境.要么对整个Windows安装(这似乎不是最理想的)或在R会话中执行以下操作:

Sys.setlocale("LC_COLLATE", "C")
Run Code Online (Sandbox Code Playgroud)

您可以使用任何其他有效的区域设置字符串代替"C"那里,但这应该让您回到所需的排序顺序letters.

阅读?locales更多.

我想值得注意的是姐妹函数Sys.getlocale(),它查询locale参数的当前设置.因此你可以做到

(locCol <- Sys.getlocale("LC_COLLATE"))
Sys.setlocale("LC_COLLATE", "lt_LT")
sort(letters)
Sys.setlocale("LC_COLLATE", locCol)
sort(letters)
Sys.getlocale("LC_COLLATE")

## giving:
> (locCol <- Sys.getlocale("LC_COLLATE"))
[1] "en_GB.UTF-8"
> Sys.setlocale("LC_COLLATE", "lt_LT")
[1] "lt_LT"
> sort(letters)
 [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "y" "j" "k" "l" "m" "n"
[16] "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "z"
> Sys.setlocale("LC_COLLATE", locCol)
[1] "en_GB.UTF-8"
> sort(letters)
 [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o"
[16] "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z"
> Sys.getlocale("LC_COLLATE")
[1] "en_GB.UTF-8"
Run Code Online (Sandbox Code Playgroud)

这当然是@Hadley的答案显示,with_collate()一旦安装了devtools,它会更加简洁.

  • 你碰巧知道在哪里可以找到可用的语言环境列表(例如`"lt_LT"`立陶宛语等等)? (3认同)
  • @JoshO'Brien http://www.localeplanet.com/icu/列出了ICU名称,这些名称似乎主要适用于Mac上的R(只有"xx_YY"形式,而不是"xx"形式)(我假设Linux版).不幸的是,它是系统特定的,Windows完全不同. (2认同)
  • @ JoshO'Brien windows locales at http://msdn.microsoft.com/en-us/goglobal/bb895996.aspx(由JJ Allaire提供) (2认同)

had*_*ley 34

如果您想暂时执行此操作,请devtools提供以下with_collate功能:

library(devtools)
with_collate("C", sort(letters))
# [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
# [20] "t" "u" "v" "w" "x" "y" "z"
with_collate("lt_LT", sort(letters))
# [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "y" "j" "k" "l" "m" "n" "o" "p" "q" "r"
# [20] "s" "t" "u" "v" "w" "x" "z"
Run Code Online (Sandbox Code Playgroud)