使 python 排序/比较的方式与 GNU 排序相同

gel*_*ida 6 python sorting bash shell

经过一些初步测试后,Python 似乎使用与 Linux 排序(gnu 排序)相同的排序顺序和 C 排序顺序(如果语言环境设置为“C”)。

\n\n

不过,我希望能够编写 Python 代码,根据语言环境以与 gnu 排序相同的方式进行排序和比较。

\n\n

说明问题的小示例代码:

\n\n
import os \nimport subprocess\n\nwords = [\n    "Abd",\n    "\xc3\xa9fg",\n    "aBd",\n    "aBd",\n    "zzz",\n    "ZZZ",\n    "efg",\n    "abd",\n    "fff",\n    ]\n\nwith open("tosort", "w") as fout:\n    for word in words:\n        fout.write(word + "\\n")\n\nos.environ["LC_ALL"] = "en_US.UTF-8" \nproc = subprocess.Popen(["sort", "tosort"], stdout=subprocess.PIPE)\nsort_en_utf = proc.stdout.read().decode(\'utf-8\').split()\n\nos.environ["LC_ALL"] = "C" \nproc = subprocess.Popen(["sort", "tosort"], stdout=subprocess.PIPE) \nsort_c = proc.stdout.read().decode(\'utf-8\').split()\n\nos.environ["LC_ALL"] = "en_US.UTF-8"\nsort_py = sorted(words)\n\nfor row in zip(sort_en_utf, sort_c, sort_py):\n    print(" ".join(row))\n
Run Code Online (Sandbox Code Playgroud)\n\n

如果运行上面的代码,我会得到以下输出:

\n\n
abd Abd Abd\naBd ZZZ ZZZ\naBd aBd aBd\nAbd aBd aBd\nefg abd abd\n\xc3\xa9fg efg efg\nfff fff fff\nzzz zzz zzz\nZZZ \xc3\xa9fg \xc3\xa9fg\n
Run Code Online (Sandbox Code Playgroud)\n\n

第 1 列是我希望在 python 代码中使用的排序/比较顺序(如果语言环境为“en_US.UTF-8”),第 2 列和第 3 列显示,python 的排序方式与 linux 的排序方式相同,如果区域设置设置为“C”。

\n\n

所以我也想知道是否有办法:

\n\n

"\xc3\xa9fg" < "fff"产量为真。我不坚持使用比较运算符,我也可以调用函数。\n但排序结果应该考虑当前区域设置。

\n

gel*_*ida 2

嗯,不知怎的,我忽略了这一点:

\n

python的排序文档https://docs.python.org/3.5/howto/sorting.html在最后一节“Odds and Ends”中提到了函数locale.strxfrm()(参见https://docs.python.org /3.5/library/locale.html#locale.strxfrm ) 作为排序的关键函数, locale.strcoll() 作为比较函数。

\n

所以下面修改后的代码几乎是可以的,除了比较函数不直接返回true / false,但这在我的上下文中是可以的

\n
import subprocess\n\nwords = [\n    "Abd",\n    "\xc3\xa9fg",\n    "aBd",\n    "aBd",\n    "zzz",\n    "ZZZ",\n    "efg",\n    "abd",\n    "fff",\n    "sra",\n    "ssa",\n    "ssb",\n    "stb",\n    "\xc3\x9faa",\n    ]\n\nval1 = "\xc3\x9faa"\nval2 = "ssb"\n\nwith open("tosort", "w") as fout:\n    for word in words:\n        fout.write(word + "\\n")\n\nos.environ["LC_ALL"] = "en_US.UTF-8"\nproc = subprocess.Popen(["sort", "tosort"], stdout=subprocess.PIPE)\nsort_en_utf = proc.stdout.read().decode(\'utf-8\').split()\n\nos.environ["LC_ALL"] = "C"\nproc = subprocess.Popen(["sort", "tosort"], stdout=subprocess.PIPE)\nsort_c = proc.stdout.read().decode(\'utf-8\').split()\n\nlocale.setlocale(locale.LC_ALL, "en_US.UTF-8")\nsort_py1 = sorted(words, key=locale.strxfrm)\nprint("%r < %r = %s , but locale.strcoll(%r, %r) = %s for %s"\n      % (val1, val2, val1 < val2, val1, val2,\n         locale.strcoll(val1, val2), locale.getlocale())\n      )\n\nlocale.setlocale(locale.LC_ALL, "C")\nsort_py2 = sorted(words, key=locale.strxfrm)\nprint("%r < %r = %s , but locale.strcoll(%r, %r) = %s for %s"\n      % (val1, val2, val1 < val2, val1, val2,\n         locale.strcoll(val1, val2), locale.getlocale())\n      )\n\nfor row in zip(sort_en_utf, sort_py1, sort_c, sort_py2):\n    print(" ".join(row))\n
Run Code Online (Sandbox Code Playgroud)\n

输出将是

\n
\'\xc3\x9faa\' < \'ssb\' = False , but locale.strcoll(\'\xc3\x9faa\', \'ssb\') = -1 for (\'en_US\', \'UTF-8\')\n\'\xc3\x9faa\' < \'ssb\' = False , but locale.strcoll(\'\xc3\x9faa\', \'ssb\') = 1 for (None, None)\nabd abd Abd Abd\naBd aBd ZZZ ZZZ\naBd aBd aBd aBd\nAbd Abd aBd aBd\nefg efg abd abd\n\xc3\xa9fg \xc3\xa9fg efg efg\nfff fff fff fff\nsra sra sra sra\nssa ssa ssa ssa\n\xc3\x9faa \xc3\x9faa ssb ssb\nssb ssb stb stb\nstb stb zzz zzz\nzzz zzz \xc3\x9faa \xc3\x9faa\nZZZ ZZZ \xc3\xa9fg \xc3\xa9fg\n
Run Code Online (Sandbox Code Playgroud)\n