小编Vir*_*mar的帖子

fit_transform()采用2个位置参数,但3个是使用LabelBinarizer

我是机器学习的新手,我一直在使用无监督学习技术.

该图显示了我的样本数据(完全清理后)屏幕截图: 示例数据

我有两个Pipline用于清理数据:

num_attribs = list(housing_num)
cat_attribs = ["ocean_proximity"]

print(type(num_attribs))

num_pipeline = Pipeline([
    ('selector', DataFrameSelector(num_attribs)),
    ('imputer', Imputer(strategy="median")),
    ('attribs_adder', CombinedAttributesAdder()),
    ('std_scaler', StandardScaler()),
])

cat_pipeline = Pipeline([
    ('selector', DataFrameSelector(cat_attribs)),
    ('label_binarizer', LabelBinarizer())
])
Run Code Online (Sandbox Code Playgroud)

然后我做了这两个管道的联合,相同的代码如下所示:

from sklearn.pipeline import FeatureUnion

full_pipeline = FeatureUnion(transformer_list=[
        ("num_pipeline", num_pipeline),
        ("cat_pipeline", cat_pipeline),
    ])
Run Code Online (Sandbox Code Playgroud)

现在我试图在数据上做fit_transform 但它显示我的错误.

转型代码:

housing_prepared = full_pipeline.fit_transform(housing)
housing_prepared
Run Code Online (Sandbox Code Playgroud)

错误消息:fit_transform()需要2个位置参数,但是给出了3个

scikit-learn data-science

67
推荐指数
6
解决办法
2万
查看次数

n_estimators和max_features在RandomForestRegressor中的含义

我正在阅读有关使用GridSearchCV精调模型的信息,并且遇到了如下所示的参数网格:

param_grid = [
{'n_estimators': [3, 10, 30], 'max_features': [2, 4, 6, 8]},

{'bootstrap': [False], 'n_estimators': [3, 10], 'max_features': [2, 3, 4]},
]
forest_reg = RandomForestRegressor(random_state=42)
# train across 5 folds, that's a total of (12+6)*5=90 rounds of training 
grid_search = GridSearchCV(forest_reg, param_grid, cv=5,
                       scoring='neg_mean_squared_error')
grid_search.fit(housing_prepared, housing_labels)
Run Code Online (Sandbox Code Playgroud)

在这里,我没有得到n_estimator和max_feature的概念。就像n_estimator表示从数据中记录的数量,而max_features意味着从数据中选择的属性的数量一样吗?

经过进一步研究,我得到了以下结果:

>> grid_search.best_params_
{'max_feature':8, 'n_estimator':30}
Run Code Online (Sandbox Code Playgroud)

所以事情是我没有得到真正的结果。

scikit-learn

1
推荐指数
1
解决办法
1万
查看次数

标签 统计

scikit-learn ×2

data-science ×1