小编had*_*ard的帖子

AttributeError：'int'对象在TFIDF和CountVectorizer中没有属性“ lower”

我试图预测输入消息的不同类别，并且我使用波斯语。我使用Tfidf和Naive-Bayes对输入数据进行分类。这是我的代码：

import pandas as pd
df=pd.read_excel('dataset.xlsx')
col=['label','body']
df=df[col]
df.columns=['label','body']
df['class_type'] = df['label'].factorize()[0]
class_type_df=df[['label','class_type']].drop_duplicates().sort_values('class_type')
class_type_id = dict(class_type_df.values)
id_to_class_type = dict(class_type_df[['class_type', 'label']].values)
from sklearn.feature_extraction.text import TfidfVectorizer
tfidf = TfidfVectorizer()
features=tfidf.fit_transform(df.body).toarray()
classtype=df.class_type
print(features.shape)
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.naive_bayes import MultinomialNB 
X_train,X_test,y_train,y_test=train_test_split(df['body'],df['label'],random_state=0)
cv=CountVectorizer()
X_train_counts=cv.fit_transform(X_train)
tfidf_transformer=TfidfTransformer()
X_train_tfidf = tfidf_transformer.fit_transform(X_train_counts)
clf = MultinomialNB().fit(X_train_tfidf, y_train)
print(clf.predict(cv.transform(["???? ? ???? ????? ?????? ?? ????"])))

Run Code Online (Sandbox Code Playgroud)

但是，当我运行上面的代码时，当我期望在输出中给我“ ads”类时，它将引发以下异常：

追溯（最近一次通话最近）：X_train_counts = cv.fit_transform（X_train）中的文件“ ... / multiclass-main.py”，第27行，文件“ ... \ sklearn \ feature_extraction \ text.py”，行1012 （在fit_transform …

python machine-learning tf-idf scikit-learn

had*_*ard

2019 01-01

4
推荐指数

1
解决办法

3214
查看次数

标签统计

machine-learning ×1

python ×1

scikit-learn ×1

tf-idf ×1

AttributeError：'int'对象在TFIDF和CountVectorizer中没有属性“ lower”

标签 统计

小编had_ard的帖子

标签统计