小编Pou*_*eri的帖子

使用XGBoost和hyperopt进行交叉验证和参数调整

使用XGB模型进行嵌套交叉验证的一种方法是：

from sklearn.model_selection import GridSearchCV, cross_val_score
from xgboost import XGBClassifier

# Let's assume that we have some data for a binary classification
# problem : X (n_samples, n_features) and y (n_samples,)...

gs = GridSearchCV(estimator=XGBClassifier(), 
                  param_grid={'max_depth': [3, 6, 9], 
                              'learning_rate': [0.001, 0.01, 0.05]}, 
                  cv=2)
scores = cross_val_score(gs, X, y, cv=2)

Run Code Online (Sandbox Code Playgroud)

但是，关于XGB参数的调整，一些教程（例如本教程）利用了Python hyperopt库。我希望能够使用hyperopt调整XGB参数来进行嵌套交叉验证（如上所述）。

为此，我编写了自己的Scikit-Learn估算器：

from hyperopt import fmin, tpe, hp, Trials, STATUS_OK
from sklearn.base import BaseEstimator, ClassifierMixin
from sklearn.model_selection import train_test_split
from sklearn.exceptions import NotFittedError
from sklearn.metrics import roc_auc_score …

Run Code Online (Sandbox Code Playgroud)

python machine-learning scikit-learn cross-validation xgboost

Pou*_*eri

2018 09-20

5
推荐指数

1
解决办法

3347
查看次数

Keras模型无法减少损失

我提出一个示例，其中tf.keras模型无法从非常简单的数据中学习。我使用tensorflow-gpu==2.0.0，keras==2.3.0和Python 3.7。在文章的结尾，我提供了Python代码来重现我观察到的问题。

数据

样本是形状为（6、16、16、16、16、3）的Numpy阵列。为了使事情变得非常简单，我只考虑充满1和0的数组。带有1的数组的标签为1，带有0的数组的标签为0。我可以使用以下n_samples = 240代码生成一些样本（在以下示例中）：

def generate_fake_data():
    for j in range(1, 240 + 1):
        if j < 120:
            yield np.ones((6, 16, 16, 16, 3)), np.array([0., 1.])
        else:
            yield np.zeros((6, 16, 16, 16, 3)), np.array([1., 0.])

Run Code Online (Sandbox Code Playgroud)

为了在tf.keras模型中输入此数据，我tf.data.Dataset使用下面的代码创建一个实例。这本质上将产生混洗的BATCH_SIZE = 12样品批次。

def make_tfdataset(for_training=True):
    dataset = tf.data.Dataset.from_generator(generator=lambda: generate_fake_data(),
                                             output_types=(tf.float32,
                                                           tf.float32),
                                             output_shapes=(tf.TensorShape([6, 16, 16, 16, 3]),
                                                            tf.TensorShape([2])))
    dataset = dataset.repeat()
    if for_training:
        dataset = dataset.shuffle(buffer_size=1000)
    dataset = dataset.batch(BATCH_SIZE)
    dataset = …

Run Code Online (Sandbox Code Playgroud)

python deep-learning keras tensorflow tensorflow-datasets

Pou*_*eri

2019 10-05

5
推荐指数

1
解决办法

165
查看次数