为文本挖掘创建词汇词典

Question

为文本挖掘创建词汇词典

我有以下代码：

train_set = ("The sky is blue.", "The sun is bright.")
test_set = ("The sun in the sky is bright.",
    "We can see the shining sun, the bright sun.")

Run Code Online (Sandbox Code Playgroud)

现在我试图计算这样的词频：

    from sklearn.feature_extraction.text import CountVectorizer
    vectorizer = CountVectorizer()

Run Code Online (Sandbox Code Playgroud)

接下来我想打印词汇表。因此我这样做：

vectorizer.fit_transform(train_set)
print vectorizer.vocabulary

Run Code Online (Sandbox Code Playgroud)

现在我得到的输出没有。虽然我期待这样的事情：

{'blue': 0, 'sun': 1, 'bright': 2, 'sky': 3}

Run Code Online (Sandbox Code Playgroud)

任何想法哪里出了问题？

Answer 1

Jos*_*hez 5

我想你可以试试这个：

print vectorizer.vocabulary_

Run Code Online (Sandbox Code Playgroud)

归档时间：	8 年，10 月前
查看次数：	8415 次
最近记录：	8 年，10 月前