我对机器学习很陌生,所以这个问题可能听起来很愚蠢。我正在学习有关文本分类的教程,但遇到了一个我不知道如何解决的错误。
这是我拥有的代码(基本上就是教程中找到的代码)
import pandas as pd
filepath_dict = {'yelp': 'data/yelp_labelled.txt',
'amazon': 'data/amazon_cells_labelled.txt',
'imdb': 'data/imdb_labelled.txt'}
df_list = []
for source, filepath in filepath_dict.items():
df = pd.read_csv(filepath, names=['sentence', 'label'], sep='\t')
df['source'] = source
df_list.append(df)
df = pd.concat(df_list)
print(df.iloc[0:4])
from sklearn.feature_extraction.text import CountVectorizer
df_yelp = df[df['source'] == 'yelp']
sentences = df_yelp['sentence'].values
y = df_yelp['label'].values
from sklearn.model_selection import train_test_split
sentences_train, sentences_test, y_train, y_test = train_test_split(sentences, y, test_size=0.25, random_state=1000)
from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer()
vectorizer.fit(sentences_train)
X_train = vectorizer.transform(sentences_train)
X_test …Run Code Online (Sandbox Code Playgroud)