小编Jam*_*ily的帖子

使用scikit-learn的多个功能

我正在使用scikit-learn进行文本分类.使用单一功能可以很好地工作,但引入多个功能会给我带来错误.我认为问题在于我没有像分类器所期望的那样格式化数据.

例如,这工作正常:

data = np.array(df['feature1'])
classes = label_encoder.transform(np.asarray(df['target']))
X_train, X_test, Y_train, Y_test = train_test_split(data, classes)

classifier = Pipeline(...)

classifier.fit(X_train, Y_train)
Run Code Online (Sandbox Code Playgroud)

但是这个:

data = np.array(df[['feature1', 'feature2']])
classes = label_encoder.transform(np.asarray(df['target']))
X_train, X_test, Y_train, Y_test = train_test_split(data, classes)

classifier = Pipeline(...)

classifier.fit(X_train, Y_train)
Run Code Online (Sandbox Code Playgroud)

死了

Traceback (most recent call last):
  File "/Users/jed/Dropbox/LegalMetric/LegalMetricML/motion_classifier.py", line 157, in <module>
    classifier.fit(X_train, Y_train)
  File "/Library/Python/2.7/site-packages/sklearn/pipeline.py", line 130, in fit
    Xt, fit_params = self._pre_transform(X, y, **fit_params)
  File "/Library/Python/2.7/site-packages/sklearn/pipeline.py", line 120, in _pre_transform
    Xt = transform.fit_transform(Xt, y, **fit_params_steps[name])
  File "/Library/Python/2.7/site-packages/sklearn/feature_extraction/text.py", line …
Run Code Online (Sandbox Code Playgroud)

python machine-learning pandas scikit-learn

7
推荐指数
1
解决办法
4606
查看次数

标签 统计

machine-learning ×1

pandas ×1

python ×1

scikit-learn ×1