AttributeError:“TfidfVectorizer”对象没有属性“get_feature_names_out”

Mhe*_*ero 12 python scikit-learn

为什么我不断收到此错误?我也尝试了其他代码,但是一旦使用该get_feature_names_out函数就会弹出此错误。

下面是我的代码:

from sklearn.datasets._twenty_newsgroups import fetch_20newsgroups
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB  # fast to train and achieves a decent F-score
from sklearn import metrics
import numpy as np

def show_top10(classifier, vectorizer, categories):
    feature_names = vectorizer.get_feature_names_out()
    for i, category in enumerate(categories):
        top10 = np.argsort(classifier.coef_[i])[-10:]
        print("%s: %s" % (category, " ".join(feature_names[top10])))

newsgroups_train = fetch_20newsgroups(subset='train')
print(list(newsgroups_train.target_names))

cats = ['alt.atheism', 'sci.space', 'rec.sport.baseball', 'rec.sport.hockey']
newsgroups_train = fetch_20newsgroups(subset='train', categories=cats)
print(list(newsgroups_train.target_names))
print(newsgroups_train.filenames.shape)

vectorizer = TfidfVectorizer()
vectors = vectorizer.fit_transform(newsgroups_train.data)
print(vectors.shape)
Run Code Online (Sandbox Code Playgroud)

afg*_*ani 31

sklearn.__version__ <= 0.24.x使用以下方法时

get_feature_names() 
Run Code Online (Sandbox Code Playgroud)

sklearn.__version__ >= 1.0.x使用以下方法时

get_feature_names_out() 
Run Code Online (Sandbox Code Playgroud)

参考:

  1. https://github.com/scikit-learn/scikit-learn/blob/0.24.X/sklearn/feature_extraction/text.py


Arn*_*rne 14

这可能是因为您使用的 scikit-learn 版本比编写此代码的版本旧。

get_feature_names_out是自 scikit-learn 1.0 以来该类的一个方法sklearn.feature_extraction.text.TfidfVectorizer。此前,有一个类似的方法,称为get_feature_names.

所以你应该更新你的 scikit-learn 包,或者使用旧的方法(不推荐)。