使用sort()进行不一致的排序

Question

使用sort()进行不一致的排序

我有以下函数来计算字符串中的单词并提取顶部的"n":

功能

def count_words(s, n):
"""Return the n most frequently occuring words in s."""

    #Split words into list
    wordlist = s.split()

    #Count words
    counts = Counter(wordlist)

    #Get top n words
    top_n = counts.most_common(n)

    #Sort by first element, if tie by second
    top_n.sort(key=lambda x: (-x[1], x[0]))

    return top_n

Run Code Online (Sandbox Code Playgroud)

因此它按出现排序,如果按字母顺序排列.以下示例:

print count_words("cat bat mat cat cat mat mat mat bat bat cat", 3)

作品(节目[('cat', 4), ('mat', 4), ('bat', 3)])

print count_words("betty bought a bit of butter but the butter was bitter", 3)

不起作用(显示[('butter', 2), ('a', 1), ('bitter', 1)]但应该有,betty而不是bitter因为它们被束缚而且be...之前bi...)

print count_words("betty bought a bit of butter but the butter was bitter", 6)

作品(节目[('butter', 2), ('a', 1), ('betty', 1), ('bitter', 1), ('but', 1), ('of', 1)]与betty之前bitter的预期)

什么可能导致(字长可能？),我怎么能解决这个问题？

Answer 1

Bak*_*riu 10

问题不是sort电话而是电话most_common.它Counter被实现为哈希表,因此它使用的顺序是任意的.当你要求most_common(n)它会返回n最常见的单词,如果有联系,它只是任意决定返回哪一个!

解决此问题的最简单方法是避免使用most_common并直接使用列表:

top_n = sorted(counts.items(), key=lambda x: (-x[1], x[0]))[:n]

Run Code Online (Sandbox Code Playgroud)

归档时间：	9 年，2 月前
查看次数：	309 次
最近记录：	9 年，2 月前