我正在使用 Python 的 hyperopt 库来执行 ML 超参数的优化。特别是我试图使用这个函数来找到 lightgbm 最优超参数来最小化:
def lgb_objective_map(params):
"""
objective function for lightgbm using MAP as success metric.
"""
# hyperopt casts as float
params['num_boost_round'] = int(params['num_boost_round'])
params['num_leaves'] = int(params['num_leaves'])
params['min_data_in_leaf'] = int(params['min_data_in_leaf'])
# need to be passed as parameter
params['verbose'] = -1
params['seed'] = 1
# Cross validation
cv_result = lgb.cv(
params,
lgtrain,
nfold=3,
metrics='binary_logloss',
num_boost_round=params['num_boost_round'],
early_stopping_rounds=20,
stratified=False,
)
# Update the number of trees based on the early stopping results
early_stop_dict[lgb_objective_map.i] = len(cv_result['binary_logloss-mean'])
params['num_boost_round'] = …Run Code Online (Sandbox Code Playgroud) 我正在使用 hyperopt 来调整我的 ML 模型,但在使用 qloguniform 作为搜索空间时遇到了麻烦。我给出了来自官方维基的例子并改变了搜索空间。
import pickle
import time
#utf8
import pandas as pd
import numpy as np
from hyperopt import fmin, tpe, hp, STATUS_OK, Trials
def objective(x):
return {
'loss': x ** 2,
'status': STATUS_OK,
# -- store other results like this
'eval_time': time.time(),
'other_stuff': {'type': None, 'value': [0, 1, 2]},
# -- attachments are handled differently
'attachments':
{'time_module': pickle.dumps(time.time)}
}
trials = Trials()
best = fmin(objective,
space=hp.qloguniform('x', np.log(0.001), np.log(0.1), np.log(0.001)),
algo=tpe.suggest,
max_evals=100,
trials=trials)
pd.DataFrame(trials.trials) …Run Code Online (Sandbox Code Playgroud) 我使用hyperopt来搜索SVM分类器的最佳参数,但Hyperopt说最好的'内核'是'0'.{'kernel':'0'}显然不合适.
有谁知道这是由我的错误还是一袋hyperopt造成的?
代码如下.
from hyperopt import fmin, tpe, hp, rand
import numpy as np
from sklearn.metrics import accuracy_score
from sklearn import svm
from sklearn.cross_validation import StratifiedKFold
parameter_space_svc = {
'C':hp.loguniform("C", np.log(1), np.log(100)),
'kernel':hp.choice('kernel',['rbf','poly']),
'gamma': hp.loguniform("gamma", np.log(0.001), np.log(0.1)),
}
from sklearn import datasets
iris = datasets.load_digits()
train_data = iris.data
train_target = iris.target
count = 0
def function(args):
print(args)
score_avg = 0
skf = StratifiedKFold(train_target, n_folds=3, shuffle=True, random_state=1)
for train_idx, test_idx in skf:
train_X = iris.data[train_idx]
train_y = iris.target[train_idx]
test_X = iris.data[test_idx] …Run Code Online (Sandbox Code Playgroud) python machine-learning scikit-learn cross-validation hyperopt
我使用hyperopt优化了我的keras模型。现在,我们如何将最佳优化的keras模型及其权重保存到磁盘。
我的代码:
from hyperopt import fmin, tpe, hp, STATUS_OK, Trials
from sklearn.metrics import roc_auc_score
import sys
X = []
y = []
X_val = []
y_val = []
space = {'choice': hp.choice('num_layers',
[ {'layers':'two', },
{'layers':'three',
'units3': hp.uniform('units3', 64,1024),
'dropout3': hp.uniform('dropout3', .25,.75)}
]),
'units1': hp.choice('units1', [64,1024]),
'units2': hp.choice('units2', [64,1024]),
'dropout1': hp.uniform('dropout1', .25,.75),
'dropout2': hp.uniform('dropout2', .25,.75),
'batch_size' : hp.uniform('batch_size', 20,100),
'nb_epochs' : 100,
'optimizer': hp.choice('optimizer',['adadelta','adam','rmsprop']),
'activation': 'relu'
}
def f_nn(params):
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation
from …Run Code Online (Sandbox Code Playgroud) 我尝试将贝叶斯优化应用于 MNIST 手写数字数据集的简单 CNN,但几乎没有迹象表明它有效。我已经尝试进行 k 折验证以消除噪声,但似乎优化仍然没有在收敛到最佳参数方面取得任何进展。一般来说,贝叶斯优化可能失败的一些主要原因是什么?在我的特殊情况下?
其余的只是上下文和代码片段。
型号定义:
def define_model(learning_rate, momentum):
model = Sequential()
model.add(Conv2D(32, (3,3), activation = 'relu', kernel_initializer = 'he_uniform', input_shape=(28,28,1)))
model.add(MaxPooling2D((2,2)))
model.add(Flatten())
model.add(Dense(100, activation='relu', kernel_initializer='he_uniform'))
model.add(Dense(10, activation='softmax'))
opt = SGD(lr=learning_rate, momentum=momentum)
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])
return model
Run Code Online (Sandbox Code Playgroud)
使用超参数运行一次训练:batch_size = 32,学习率 = 1e-2,Momentum = 0.9,10 个 epoch。(蓝色 = 训练,黄色 = 验证)。
盒须图用于 5 折交叉验证的准确性,具有与上述相同的超参数(以了解传播)
网格搜索将 batch_size 保持在 32,并保持 10 个纪元。我是在单次评估而不是 5 倍上这样做的,因为差价不足以破坏结果。
贝叶斯优化。如上,batch_size=32 和 10 epoch。在相同的范围内搜索。但这一次使用 5 折交叉验证来消除噪音。它应该进行 100 次迭代,但这还需要 20 个小时。
space = …Run Code Online (Sandbox Code Playgroud) 我会完全承认我可能在这里设置了错误的条件空间,但是由于某种原因,我根本无法使它发挥作用。我正在尝试使用hyperopt来调整逻辑回归模型,并且取决于求解器,还需要探索其他一些参数。如果选择liblinear解算器,则可以选择惩罚,根据惩罚,您还可以选择对偶。但是,当我尝试在此搜索空间上运行hyperopt时,它一直给我一个错误,因为它通过了整个字典,如下所示。有任何想法吗?我得到的错误是'ValueError:Logistic回归仅支持liblinear,newton-cg,lbfgs和sag求解器,得到了{'solver':'sag'}'这种格式在设置随机森林搜索空间时有效,所以我米茫然。
import numpy as np
import scipy as sp
import pandas as pd
pd.options.display.max_columns = None
pd.options.display.max_rows = None
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set(style="white")
import pyodbc
import statsmodels as sm
from pandasql import sqldf
import math
from tqdm import tqdm
import pickle
from sklearn.preprocessing import RobustScaler, OneHotEncoder, MinMaxScaler
from sklearn.utils import shuffle
from sklearn.cross_validation import KFold, StratifiedKFold, cross_val_score, cross_val_predict, train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import StratifiedKFold as StratifiedKFoldIt
from sklearn.feature_selection import RFECV, …Run Code Online (Sandbox Code Playgroud) python scikit-learn logistic-regression hyperparameters hyperopt
此查询指的是将试验用作 fmin 中的参数。
trials = Trials()
best = fmin(objective, space=hp.uniform('x', -10, 10), algo=tpe.suggest,
max_evals=100, trials=trials)
Run Code Online (Sandbox Code Playgroud)
文档 ( https://github.com/hyperopt/hyperopt/wiki/FMin ) 指出,试验对象获得了诸如trial.trials、trial.results、trial.losses()和trial.statuses() 之类的列表。
但是,我看到了文档中没有提到的trials.best_trial和trial.trial_attachments等用法。
现在我想知道如何获得试验对象的所有内容的列表?对象类型是hyperopt.base.Trials。
我正在为我的主项目演示 hyperopt 的 TPE 算法,但似乎无法让算法收敛。根据我从原始论文和 YouTube讲座中了解到的,TPE 算法按以下步骤工作:
(下文中,x=超参数,y=损失)
我已经在 python 中对目标函数 f(x) = x^2 实现了这一点,但算法无法收敛到最小值。
import numpy as np
import scipy as sp
from matplotlib import pyplot as plt
from scipy.stats import gaussian_kde
def objective_func(x):
return x**2
def measure(x):
noise = np.random.randn(len(x))*0
return x**2+noise
def split_meassures(x_obs,y_obs,gamma=1/2):
#split x and …Run Code Online (Sandbox Code Playgroud) 我尝试在回归模型上使用Hyperopt,以便为每个变量定义其超参数之一,并且需要作为列表传递。例如,如果我有一个包含 3 个自变量(不包括常量)的回归,我会通过hyperparameter = [x, y, z](其中x, y, z是浮点数)。
无论应用于哪个变量,该超参数的值都具有相同的界限。如果这个超参数应用于所有变量,我可以简单地使用hp.uniform('hyperparameter', a, b). 我希望搜索空间是hp.uniform('hyperparameter', a, b)length的笛卡尔积n,其中n是回归中变量的数量(所以,基本上,itertools.product(hp.uniform('hyperparameter', a, b), repeat = n))
我想知道这在 Hyperopt 中是否可行。如果没有,欢迎提供任何可能的优化器建议。
我正在运行运行时 8.1(包括 Apache Spark 3.1.1、Scala 2.12),试图让 hyperopt 按定义工作
py4j.Py4JException: Method maxNumConcurrentTasks([]) does not exist
Run Code Online (Sandbox Code Playgroud)
当我尝试
spark_trials = SparkTrials()
Run Code Online (Sandbox Code Playgroud)
我需要做什么特别的事情才能使其正常工作吗?
这是我正在使用的集群
{
"autoscale": {
"min_workers": 1,
"max_workers": 2
},
"cluster_name": "mlops_tiny_ml",
"spark_version": "8.2.x-cpu-ml-scala2.12",
"spark_conf": {},
"aws_attributes": {
"first_on_demand": 1,
"availability": "SPOT_WITH_FALLBACK",
"zone_id": "us-west-2b",
"instance_profile_arn": "arn:aws:iam::112437402463:instance-profile/databricks_instance_role_s3",
"spot_bid_price_percent": 100,
"ebs_volume_type": "GENERAL_PURPOSE_SSD",
"ebs_volume_count": 3,
"ebs_volume_size": 100
},
"node_type_id": "m4.large",
"driver_node_type_id": "m4.large",
"ssh_public_keys": [],
"custom_tags": {},
"spark_env_vars": {},
"autotermination_minutes": 120,
"enable_elastic_disk": false,
"cluster_source": "UI",
"init_scripts": [],
"cluster_id": "0xxxxxt404"
}
Run Code Online (Sandbox Code Playgroud)
hyperopt ×10
python ×7
keras ×2
python-3.x ×2
scikit-learn ×2
apache-spark ×1
bayesian ×1
databricks ×1
optimization ×1