小编fat*_*mau的帖子

稀疏矩阵长度不明确

我对机器学习很陌生,所以这个问题可能听起来很愚蠢。我正在学习有关文本分类教程,但遇到了一个我不知道如何解决的错误。

这是我拥有的代码(基本上就是教程中找到的代码)

import pandas as pd

filepath_dict = {'yelp':   'data/yelp_labelled.txt',
              'amazon': 'data/amazon_cells_labelled.txt',
              'imdb':   'data/imdb_labelled.txt'}

df_list = []
for source, filepath in filepath_dict.items():
df = pd.read_csv(filepath, names=['sentence', 'label'], sep='\t')
df['source'] = source  
df_list.append(df)

df = pd.concat(df_list)
print(df.iloc[0:4])


from sklearn.feature_extraction.text import CountVectorizer

df_yelp = df[df['source'] == 'yelp']

sentences = df_yelp['sentence'].values
y = df_yelp['label'].values

from sklearn.model_selection import train_test_split
sentences_train, sentences_test, y_train, y_test = train_test_split(sentences, y, test_size=0.25, random_state=1000)


from sklearn.feature_extraction.text import CountVectorizer


vectorizer = CountVectorizer()
vectorizer.fit(sentences_train)

X_train = vectorizer.transform(sentences_train)
X_test …
Run Code Online (Sandbox Code Playgroud)

python scikit-learn keras sklearn-pandas

2
推荐指数
1
解决办法
3836
查看次数

标签 统计

keras ×1

python ×1

scikit-learn ×1

sklearn-pandas ×1