小编use*_*220的帖子

使用python从语料库中提取最常用的单词

也许这是一个愚蠢的问题，但是我在使用Python从语料库中提取十个最常见的单词时遇到了问题。这就是到目前为止。（顺便说一句，我与NLTK一起阅读一个带有两个子类别的语料库，每个子类别有10个.txt文件）

import re
import string
from nltk.corpus import stopwords
stoplist = stopwords.words('dutch')

from collections import defaultdict
from operator import itemgetter

def toptenwords(mycorpus):
    words = mycorpus.words()
    no_capitals = set([word.lower() for word in words]) 
    filtered = [word for word in no_capitals if word not in stoplist]
    no_punct = [s.translate(None, string.punctuation) for s in filtered] 
    wordcounter = {}
    for word in no_punct:
        if word in wordcounter:
            wordcounter[word] += 1
        else:
            wordcounter[word] = 1
    sorting = sorted(wordcounter.iteritems(), key = itemgetter, reverse = True)
    return …

Run Code Online (Sandbox Code Playgroud)

python dictionary frequency word-count

use*_*220

2013 12-09

2
推荐指数

3
解决办法

5712
查看次数

如何使用python从文本文件中创建字典

我的文件看起来像这样:

aaien 12 13 39
aan 10
aanbad 12 13 14 57 58 38
aanbaden 12 13 14 57 58 38
aanbeden 12 13 14 57 58 38
aanbid  12 13 14 57 58 39
aanbidden 12 13 14 57 58 39
aanbidt 12 13 14 57 58 39
aanblik 27 28
aanbreken 39
...

Run Code Online (Sandbox Code Playgroud)

我想用key =这个词(比如'aaien')创建一个字典,值应该是它旁边的数字列表.所以它必须这样看:{'aaien':['12,13,39'],'aan':['10']}

这段代码似乎不起作用.

document = open('LIWC_words.txt', 'r')
liwcwords = document.read()
dictliwc = {}
for line in liwcwords:
    k, v = line.strip().split(' ')
    answer[k.strip()] = v.strip() …

Run Code Online (Sandbox Code Playgroud)

python dictionary distinct-values

use*_*220

2018 07-12

2
推荐指数

1
解决办法

2万
查看次数