标签: tpot

自动机器学习 python 等效代码

有没有办法从 auto-sklearn 的独立 python 脚本中提取自动生成的机器学习管道？

以下是使用 auto-sklearn 的示例代码：

import autosklearn.classification
import sklearn.cross_validation
import sklearn.datasets
import sklearn.metrics

digits = sklearn.datasets.load_digits()
X = digits.data
y = digits.target
X_train, X_test, y_train, y_test = sklearn.cross_validation.train_test_split(X, y, random_state=1)

automl = autosklearn.classification.AutoSklearnClassifier()
automl.fit(X_train, y_train)
y_hat = automl.predict(X_test)

print("Accuracy score", sklearn.metrics.accuracy_score(y_test, y_hat))

Run Code Online (Sandbox Code Playgroud)

以某种方式自动生成等效的 python 代码会很好。

相比之下，当使用 TPOT 时，我们可以获得如下的独立管道：

from tpot import TPOTClassifier
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split

digits = load_digits()
X_train, X_test, y_train, y_test = train_test_split(digits.data, digits.target, train_size=0.75, test_size=0.25)

tpot = TPOTClassifier(generations=5, population_size=20, verbosity=2) …

Run Code Online (Sandbox Code Playgroud)

python scikit-learn automl tpot

val*_*tin

2018 01-09

5
推荐指数

1
解决办法

806
查看次数

带有 tpot 的分类数据

我正在尝试将 tpot 与我在 Pandas 数据帧中的输入一起使用。我不断收到错误：

类型错误：输入类型不支持 ufunc 'isnan'，并且无法根据转换规则 ''safe'' 将输入安全地强制转换为任何受支持的类型

我相信这个错误是因为 isnan 无法处理我的数据结构，但我不确定如何对其进行不同的格式化。我有分类和连续输入以及连续输出的组合。这是具有类似数据的代码示例：

train_x=[[1,2,3],['test1','test2','test3'],[56.2,4.5,3.4]]
train_y=[[3,6,7]]
from tpot import TPOTRegressor

tpot=TPOTRegressor()

Run Code Online (Sandbox Code Playgroud)

我是否必须以某种方式转换我的分类数据？dataframe.values 和 dataframe.as_matrix 给我的对象也给我一个错误。

python tpot

Deb*_*aul

lucky-day

4
推荐指数

1
解决办法

1650
查看次数

TPOT：多类数据分类失败

我无法让 TPot（v. 0.9.2，Python 2.7）处理多类数据（尽管我在 TPot 的文档中找不到任何说它只进行二进制分类的内容）。

下面提供了一个示例。它运行到 9%，然后因错误而死：

RuntimeError: There was an error in the TPOT optimization process. 
This could be because the data was not formatted properly, or because
data for a regression problem was provided to the TPOTClassifier 
object. Please make sure you passed the data to TPOT correctly.

Run Code Online (Sandbox Code Playgroud)

但是将 n_classes 更改为 2 并且它运行正常。

RuntimeError: There was an error in the TPOT optimization process. 
This could be because the data was not formatted properly, or because
data for …

Run Code Online (Sandbox Code Playgroud)

python machine-learning scikit-learn tpot

gal*_*pah

2019 10-12

3
推荐指数

1
解决办法

1717
查看次数

在获得最佳 TPOT 管道后获得 feature_importances_？

我已经通读了几页，但需要有人帮助解释如何进行这项工作。

我正在使用TPOTRegressor()以获得最佳管道，但从那里我希望能够绘制.feature_importances_它返回的管道：

best_model = TPOTRegressor(cv=folds, generations=2, population_size=10, verbosity=2, random_state=seed) #memory='./PipelineCache',       memory='auto',
best_model.fit(X_train, Y_train)
feature_importance = best_model.fitted_pipeline_.steps[-1][1].feature_importances_

Run Code Online (Sandbox Code Playgroud)

我在 Github 上的一个现已关闭的问题中看到了这种设置，但目前我收到错误消息：

Best pipeline: LassoLarsCV(input_matrix, normalize=True)

Traceback (most recent call last):
  File "main2.py", line 313, in <module>
    feature_importance = best_model.fitted_pipeline_.steps[-1][1].feature_importances_
AttributeError: 'LassoLarsCV' object has no attribute 'feature_importances_'

Run Code Online (Sandbox Code Playgroud)

那么，我如何从最佳管道中获得这些特征重要性，而不管它落在哪个管道上？或者这甚至可能吗？或者有人有更好的方法来尝试从 TPOT 运行中绘制特征重要性吗？

谢谢！

更新

为澄清起见，特征重要性的含义是确定数据集的每个特征 (X) 在确定预测 (Y) 标签方面的重要性，使用条形图绘制每个特征在得出其预测时的重要性级别。TPOT 不直接执行此操作（我不认为），所以我想我会抓住它提出的管道，在训练数据上重新运行它，然后以某种方式使用 a.feature_imprtances_然后能够绘制图形特征重要性，因为这些都是我正在使用的 sklearn 回归器？

python pipeline regression scikit-learn tpot

Mat*_*son

2019 08-06

3
推荐指数

1
解决办法

1561
查看次数