“ImageDraw”对象没有属性“textbbox”

San*_*oon 6 python plot matplotlib word-cloud pandas

我正在开发一个简单的文本挖掘项目。当我尝试创建词云时,出现以下错误:

AttributeError: 'ImageDraw' object has no attribute 'textbbox'

我有一个新闻及其类别的数据集;为了创建词云,我尝试预处理文本:


import pandas as pd
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer
from textblob import Word
from wordcloud import WordCloud 

newsData = pd.read_csv("data.txt", sep= '\t', header=None, 
                       names=["Description", "Category", "Tags"],on_bad_lines='skip', 
                       engine='python' , encoding='utf-8')
#print(newsData.head())

newsData['Description'] =  newsData['Description'].apply(lambda x:  " ".join(x.lower() for x in x.split()))
newsData['Category'] =  newsData['Category'].apply(lambda x:  " ".join(x.lower() for x in x.split()))
newsData['Tags'] =  newsData['Tags'].apply(lambda x:  " ".join(x.lower() for x in x.split()))

# stopword filtering
stop = stopwords.words('english')
newsData['Description'] =  newsData['Description'].apply(lambda x: " ".join (x for x in x.split() if x not in stop))
#stemming

st = PorterStemmer()
newsData['Description'] =  newsData['Description'].apply(lambda x: " ".join ([st.stem(word) for word in x.split()]))
newsData['Category'] =  newsData['Category'].apply(lambda x: " ".join ([st.stem(word) for word in x.split()]))
newsData['Tags'] =  newsData['Tags'].apply(lambda x: " ".join ([st.stem(word) for word in x.split()]))

#lemmatize

newsData['Description'] =  newsData['Description'].apply(lambda x: " ".join ([Word(word).lemmatize() for word in x.split()]))
newsData['Category'] =  newsData['Category'].apply(lambda x: " ".join ([Word(word).lemmatize() for word in x.split()]))
newsData['Tags'] =  newsData['Tags'].apply(lambda x: " ".join ([Word(word).lemmatize() for word in x.split()]))
#print(newsData.head())


culture = newsData[newsData['Category'] == 'culture'].sample(n=200)
health = newsData[newsData['Category'] == 'health'].sample(n=200)
dataSample = pd.concat([culture, health],axis=0)

culturesmpl = culture[culture['Category'] == 'culture'].sample(n=200)
healthspml = health[health['Category'] == 'health'].sample(n=200)
#print(dataSample.head())

cultureSTR = culturesmpl.Description.str.cat()
healthSTR = healthspml.Description.str.cat()
#print(spam_str)
Run Code Online (Sandbox Code Playgroud)

然后我尝试使用 WordCloud 库创建 wordcloud

wordcloud_culture =  WordCloud(collocations= False, background_color='white' ).generate(cultureSTR)

# Plot
plt.imshow(wordcloud_culture, interpolation='bilinear')
plt.axis('off')
plt.show()
Run Code Online (Sandbox Code Playgroud)

但运行此代码后我收到错误:

  File ~/anaconda3/lib/python3.9/site-packages/wordcloud/wordcloud.py:508 in generate_from_frequencies
    box_size = draw.textbbox((0, 0), word, font=transposed_font, anchor="lt")

AttributeError: 'ImageDraw' object has no attribute 'textbbox'
Run Code Online (Sandbox Code Playgroud)

你知道我该如何解决这个问题吗?

Spl*_*lic 8

历史

ImageDraw.textsize()方法在 PIL 版本 9.2.0 中已弃用,并从 2023 年 7 月 1 日版本 10.0.0 开始完全删除。

ImageDraw.textbbox()方法是在 8.0.0 版本中引入的,作为更强大的解决方案。

例子

如果您只是想替换一行代码,并且您之前有过

text_width, text_height = ImageDraw.Draw(image).textsize(your_text, font=your_font)

..那么你可以改用

_, _, text_width, text_height = ImageDraw.Draw(image).textbbox((0, 0), your_text, font=your_font)

解释

textsize()以元组形式输出文本标称(width, height)宽度和高度的尺寸: 。 textbbox()将边界框的 x 和 y 范围输出为元组:(left, top, right, bottom).

以 开始该行_, _,是丢弃输出元组的前两个元素的一种方法。

添加(0, 0)为第一个参数textbbox()告诉它将边界框锚定在原点。

避免依赖过时的库,并探索这种变化的原因以及为什么这textbbox()是一种更稳健的方法!