计算嵌套列表中的频率

Question

计算嵌套列表中的频率

我正在尝试使用嵌套列表中的字典来计算单词的频率.每个嵌套列表都是分成每个单词的句子.另外,我想删除句子开头的专有名词和小写单词.是否有可能获得专有名词？

x = [["Hey", "Kyle","are", "you", "doing"],["I", "am", "doing", "fine"]["Kyle", "what", "time" "is", "it"]

from collections import Counter
def computeFrequencies(x):
    count = Counter()
    for listofWords in L:
        for word in L:
            count[word] += 1
    return count

Run Code Online (Sandbox Code Playgroud)

它返回一个错误:unhashable type:'list'

我想在没有字典周围的Counter()的情况下返回这个:

{"hey": 1, "how": 1, "are": 1, "you": 1, "doing": 2, "i": , "am": 1, "fine": 1, "what": 1, "time": 1, "is": 1, "it": 1}

Run Code Online (Sandbox Code Playgroud)

Answer 1

the*_*eye 7

由于您的数据是嵌套的,您可以chain.from_iterable像这样展平它

from itertools import chain
from collections import Counter
print Counter(chain.from_iterable(x))
# Counter({'doing': 2, 'Kyle': 2, 'what': 1, 'timeis': 1, 'am': 1, 'Hey': 1, 'I': 1, 'are': 1, 'it': 1, 'you': 1, 'fine': 1})

Run Code Online (Sandbox Code Playgroud)

如果你想使用生成器表达式,那么你可以这样做

from collections import Counter
print Counter(item for items in x for item in items)

Run Code Online (Sandbox Code Playgroud)

如果你想在不使用Counter的情况下这样做,那么你可以使用这样的普通字典

my_counter = {}
for line in x:
    for word in line:
        my_counter[word] = my_counter.get(word, 0) + 1
print my_counter

Run Code Online (Sandbox Code Playgroud)

你也可以collections.defaultdict像这样使用

from collections import defaultdict
my_counter = defaultdict(int)
for line in x:
    for word in line:
        my_counter[word] += 1

print my_counter

Run Code Online (Sandbox Code Playgroud)

好吧,如果你只是想将Counter对象转换为一个dict对象(我认为根本不需要它,因为Counter它实际上是一个字典.你可以访问键值,迭代,删除更新Counter对象就像一个普通的字典对象),你可以使用bsoist的建议,

print dict(Counter(chain.from_iterable(x)))

Run Code Online (Sandbox Code Playgroud)

归档时间：	11 年，6 月前
查看次数：	449 次
最近记录：	11 年，6 月前