在Python中对带有重音字符的字符串进行排序

0xc*_*ced 5 python sorting collation diacritics

\n

可能的重复:
\n Python 未正确排序 unicode。斯特科尔没有帮助。

\n
\n\n\n\n

我正在尝试按字母顺序对一些单词进行排序。我是这样做的:

\n\n
#!/opt/local/bin/python2.7\n# -*- coding: utf-8 -*-\n\nimport locale\n\n# Make sure the locale is in french\nlocale.setlocale(locale.LC_ALL, "fr_FR.UTF-8")\nprint "locale: " + str(locale.getlocale())\n\n# The words are in alphabetical order\nwords = ["liche", "lich\xc3\xa9e", "lichen", "lich\xc3\xa9no\xc3\xafde", "licher", "lichoter"]\n\nfor word in sorted(words, cmp=locale.strcoll):\n    print word.decode("string-escape")\n
Run Code Online (Sandbox Code Playgroud)\n\n

我期望这些单词按照它们定义的顺序打印,但这是我得到的:

\n\n
locale: (\'fr_FR\', \'UTF8\')\nliche\nlichen\nlicher\nlichoter\nlich\xc3\xa9e\nlich\xc3\xa9no\xc3\xafde\n
Run Code Online (Sandbox Code Playgroud)\n\n

\xc3\xa9字符被视为大于z

\n\n

看来我误解了如何locale.strcoll比较字符串。我应该使用什么比较器函数来按字母顺序对单词进行排序?

\n

0xc*_*ced 2

我最终选择删除变音符号并比较字符串的删除版本,这样我就不必添加 PyICU 依赖项。