如何在Python中从语料库创建一个词云?

alv*_*vas 40 python corpus nltk word-cloud gensim

创建R中的语料库中的单词子集,应答者可以轻松地将term-document matrix词汇转换为词云.

python库是否有一个类似的功能,它将原始文本文件或NLTK语料库或GensimMmcorpus带入词云?

结果看起来有点像这样: 在此输入图像描述

Hea*_*ail 14

from wordcloud import WordCloud, STOPWORDS
import matplotlib.pyplot as plt
stopwords = set(STOPWORDS)

def show_wordcloud(data, title = None):
    wordcloud = WordCloud(
        background_color='white',
        stopwords=stopwords,
        max_words=200,
        max_font_size=40, 
        scale=3,
        random_state=1 # chosen at random by flipping a coin; it was heads
    ).generate(str(data))

    fig = plt.figure(1, figsize=(12, 12))
    plt.axis('off')
    if title: 
        fig.suptitle(title, fontsize=20)
        fig.subplots_adjust(top=2.3)

    plt.imshow(wordcloud)
    plt.show()

show_wordcloud(Samsung_Reviews_Negative['Reviews'])
show_wordcloud(Samsung_Reviews_positive['Reviews'])
Run Code Online (Sandbox Code Playgroud)

在此输入图像描述


小智 10

如果您需要这些单词云在网站或Web应用程序中显示它们,您可以将数据转换为json或csv格式并将其加载到JavaScript可视化库(如d3).d3上的字云

如果没有,Marcin的回答是做你所描述的一个好方法.


Myo*_*age 9

amueller的代码在行动中的示例

在命令行/终端:

sudo pip install wordcloud
Run Code Online (Sandbox Code Playgroud)

然后运行python脚本:

## Simple WordCloud
import matplotlib.pyplot as plt
from wordcloud import WordCloud, STOPWORDS 

text = 'all your base are belong to us all of your base base base'

def generate_wordcloud(text): # optionally add: stopwords=STOPWORDS and change the arg below
    wordcloud = WordCloud(font_path='/Library/Fonts/Verdana.ttf',
                          relative_scaling = 1.0,
                          stopwords = {'to', 'of'} # set or space-separated string
                          ).generate(text)
    plt.imshow(wordcloud)
    plt.axis("off")
    plt.show()

generate_wordcloud(text)
Run Code Online (Sandbox Code Playgroud)

在此输入图像描述