vas*_*111 3 python pipeline machine-learning python-3.x scikit-learn
该示例是完全可重现的。这是完整的笔记本(也下载数据):https ://github.com/ageron/handson-ml2/blob/master/02_end_to_end_machine_learning_project.ipynb
在上面笔记本中的这部分之后:
full_pipeline_with_predictor = Pipeline([
("preparation", full_pipeline),
("linear", LinearRegression())
])
full_pipeline_with_predictor.fit(housing, housing_labels)
full_pipeline_with_predictor.predict(some_data)
Run Code Online (Sandbox Code Playgroud)
我正在尝试使用以下代码获得对测试集的预测:
X_test_prepared = full_pipeline.transform(X_test)
final_predictions = full_pipeline_with_predictor.predict(X_test_prepared)
Run Code Online (Sandbox Code Playgroud)
但我收到错误:
C:\Users\Alex\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\compose\_column_transformer.py:430: FutureWarning: Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.
FutureWarning)
---------------------------------------------------------------------------
Empty Traceback (most recent call last)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\joblib\parallel.py in dispatch_one_batch(self, iterator)
796 try:
--> 797 tasks = self._ready_batches.get(block=False)
798 except queue.Empty:
~\AppData\Local\Continuum\anaconda3\lib\queue.py in get(self, block, timeout)
166 if not self._qsize():
--> 167 raise Empty
168 elif timeout is None:
Empty:
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
<ipython-input-141-dc87b1c9e658> in <module>
5
6 X_test_prepared = full_pipeline.transform(X_test)
----> 7 final_predictions = full_pipeline_with_predictor.predict(X_test_prepared)
8
9 final_mse = mean_squared_error(y_test, final_predictions)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\utils\metaestimators.py in <lambda>(*args, **kwargs)
114
115 # lambda, but not partial, allows help() to work with update_wrapper
--> 116 out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
117 # update the docstring of the returned function
118 update_wrapper(out, self.fn)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\pipeline.py in predict(self, X, **predict_params)
417 Xt = X
418 for _, name, transform in self._iter(with_final=False):
--> 419 Xt = transform.transform(Xt)
420 return self.steps[-1][-1].predict(Xt, **predict_params)
421
~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\compose\_column_transformer.py in transform(self, X)
586
587 self._validate_features(X.shape[1], X_feature_names)
--> 588 Xs = self._fit_transform(X, None, _transform_one, fitted=True)
589 self._validate_output(Xs)
590
~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\compose\_column_transformer.py in _fit_transform(self, X, y, func, fitted)
455 message=self._log_message(name, idx, len(transformers)))
456 for idx, (name, trans, column, weight) in enumerate(
--> 457 self._iter(fitted=fitted, replace_strings=True), 1))
458 except ValueError as e:
459 if "Expected 2D array, got 1D array instead" in str(e):
~\AppData\Local\Continuum\anaconda3\lib\site-packages\joblib\parallel.py in __call__(self, iterable)
1002 # remaining jobs.
1003 self._iterating = False
-> 1004 if self.dispatch_one_batch(iterator):
1005 self._iterating = self._original_iterator is not None
1006
~\AppData\Local\Continuum\anaconda3\lib\site-packages\joblib\parallel.py in dispatch_one_batch(self, iterator)
806 big_batch_size = batch_size * n_jobs
807
--> 808 islice = list(itertools.islice(iterator, big_batch_size))
809 if len(islice) == 0:
810 return False
~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\compose\_column_transformer.py in <genexpr>(.0)
454 message_clsname='ColumnTransformer',
455 message=self._log_message(name, idx, len(transformers)))
--> 456 for idx, (name, trans, column, weight) in enumerate(
457 self._iter(fitted=fitted, replace_strings=True), 1))
458 except ValueError as e:
~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\utils\__init__.py in _safe_indexing(X, indices, axis)
404 if axis == 1 and indices_dtype == 'str' and not hasattr(X, 'loc'):
405 raise ValueError(
--> 406 "Specifying the columns using strings is only supported for "
407 "pandas DataFrames"
408 )
ValueError: Specifying the columns using strings is only supported for pandas DataFrames
Run Code Online (Sandbox Code Playgroud)
问题:我怎样才能纠正这个错误?为什么会发生这个错误?
自您的最终管道以来:
full_pipeline_with_predictor = Pipeline([
("preparation", full_pipeline),
("linear", LinearRegression())
])
Run Code Online (Sandbox Code Playgroud)
显然已经包含了full_pipeline
,你不应该再次“准备”你的X_test
;这样做,你就“准备”了X_test
两次,这是错误的。所以,你的代码应该很简单
final_predictions = full_pipeline_with_predictor.predict(X_test)
Run Code Online (Sandbox Code Playgroud)
与获取 的预测完全相同some_data
,即
full_pipeline_with_predictor.predict(some_data)
Run Code Online (Sandbox Code Playgroud)
some_data
在将它们送入最终管道之前,您正确地没有“准备”它们。
使用管道的全部目的正是如此,即避免必须为可能的多个准备步骤单独运行拟合预测,而是将它们全部包装到单个管道中。当你预测 时,你在这里正确地应用了这个过程some_data
,但是当你尝试预测 时,你似乎在下一步中忘记了它X_test
。
归档时间: |
|
查看次数: |
4570 次 |
最近记录: |