wer*_*hao 7 mongodb python-2.7
我正在尝试使用与MongoDB的Hyperopt并行搜索,并遇到Mongotrials的一些问题,这里已经讨论过了.我已经尝试了所有方法,但仍然无法找到解决我特定问题的方法.我试图最小化的具体模型是来自sklearn的RadomForestRegressor.
我已经按照本教程.而且我能够打印出计算出的"fmin"而没有任何问题.
这是我到目前为止的步骤:
1)激活一个名为"tensorflow"的虚拟环境(我已在那里安装了所有库)
2)启动MongoDB:
(tensorflow) bash-3.2$ mongod --dbpath . --port 1234 --directoryperdb --journal --nohttpinterface
Run Code Online (Sandbox Code Playgroud)
3)启动工人:
(tensorflow) bash-3.2$ hyperopt-mongo-worker --mongo=localhost:1234/foo_db --poll-interval=0.1
Run Code Online (Sandbox Code Playgroud)
4)运行我的python代码,我的python代码如下:
import numpy as np
import pandas as pd
from sklearn.metrics import mean_absolute_error
from hyperopt import hp, fmin, tpe, STATUS_OK, Trials
from hyperopt.mongoexp import MongoTrials
# Preprocessing data
train_xg = pd.read_csv('train.csv')
n_train = len(train_xg)
print "Whole data set size: ", n_train
# Creating columns for features, and categorical features
features_col = [x for x in train_xg.columns if x not in ['id', 'loss', 'log_loss']]
cat_features_col = [x for x in train_xg.select_dtypes(include=['object']).columns if x not in ['id', 'loss', 'log_loss']]
for c in range(len(cat_features_col)):
train_xg[cat_features_col[c]] = train_xg[cat_features_col[c]].astype('category').cat.codes
# Use this to train random forest regressor
train_xg_x = np.array(train_xg[features_col])
train_xg_y = np.array(train_xg['loss'])
space_rf = { 'min_samples_leaf': hp.choice('min_samples_leaf', range(1,100)) }
trials = MongoTrials('mongo://localhost:1234/foo_db/jobs', exp_key='exp1')
def minMe(params):
# Hyperopt tuning for hyperparameters
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestRegressor
from hyperopt import STATUS_OK
try:
import dill as pickle
print('Went with dill')
except ImportError:
import pickle
def hyperopt_rf(params):
rf = RandomForestRegressor(**params)
return cross_val_score(rf, train_xg_x, train_xg_y).mean()
acc = hyperopt_rf(params)
print 'new acc:', acc, 'params: ', params
return {'loss': -acc, 'status': STATUS_OK}
best = fmin(fn=minMe, space=space_rf, trials=trials, algo=tpe.suggest, max_evals=100)
print "Best: ", best
Run Code Online (Sandbox Code Playgroud)
5)运行上面的Python代码后,我收到以下错误:
INFO:hyperopt.mongoexp:Error while unpickling. Try installing dill via "pip install dill" for enhanced pickling support.
INFO:hyperopt.mongoexp:job exception: 'module' object has no attribute 'minMe'
Traceback (most recent call last):
File "/Users/WernerChao/tensorflow/bin/hyperopt-mongo-worker", line 6, in <module>
sys.exit(hyperopt.mongoexp.main_worker())
File "/Users/WernerChao/tensorflow/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1302, in main_worker
return main_worker_helper(options, args)
File "/Users/WernerChao/tensorflow/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1249, in main_worker_helper
mworker.run_one(reserve_timeout=float(options.reserve_timeout))
File "/Users/WernerChao/tensorflow/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1064, in run_one
domain = pickle.loads(blob)
AttributeError: 'module' object has no attribute 'minMe'
INFO:hyperopt.mongoexp:PROTOCOL mongo
INFO:hyperopt.mongoexp:USERNAME None
INFO:hyperopt.mongoexp:HOSTNAME localhost
INFO:hyperopt.mongoexp:PORT 1234
INFO:hyperopt.mongoexp:PATH /foo_db/jobs
INFO:hyperopt.mongoexp:DB foo_db
INFO:hyperopt.mongoexp:COLLECTION jobs
INFO:hyperopt.mongoexp:PASS None
INFO:hyperopt.mongoexp:Error while unpickling. Try installing dill via "pip install dill" for enhanced pickling support.
INFO:hyperopt.mongoexp:job exception: 'module' object has no attribute 'minMe'
Traceback (most recent call last):
File "/Users/WernerChao/tensorflow/bin/hyperopt-mongo-worker", line 6, in <module>
sys.exit(hyperopt.mongoexp.main_worker())
File "/Users/WernerChao/tensorflow/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1302, in main_worker
return main_worker_helper(options, args)
File "/Users/WernerChao/tensorflow/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1249, in main_worker_helper
mworker.run_one(reserve_timeout=float(options.reserve_timeout))
File "/Users/WernerChao/tensorflow/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1064, in run_one
domain = pickle.loads(blob)
AttributeError: 'module' object has no attribute 'minMe'
INFO:hyperopt.mongoexp:PROTOCOL mongo
INFO:hyperopt.mongoexp:USERNAME None
INFO:hyperopt.mongoexp:HOSTNAME localhost
INFO:hyperopt.mongoexp:PORT 1234
INFO:hyperopt.mongoexp:PATH /foo_db/jobs
INFO:hyperopt.mongoexp:DB foo_db
INFO:hyperopt.mongoexp:COLLECTION jobs
INFO:hyperopt.mongoexp:PASS None
INFO:hyperopt.mongoexp:Error while unpickling. Try installing dill via "pip install dill" for enhanced pickling support.
INFO:hyperopt.mongoexp:job exception: 'module' object has no attribute 'minMe'
Traceback (most recent call last):
File "/Users/WernerChao/tensorflow/bin/hyperopt-mongo-worker", line 6, in <module>
sys.exit(hyperopt.mongoexp.main_worker())
File "/Users/WernerChao/tensorflow/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1302, in main_worker
return main_worker_helper(options, args)
File "/Users/WernerChao/tensorflow/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1249, in main_worker_helper
mworker.run_one(reserve_timeout=float(options.reserve_timeout))
File "/Users/WernerChao/tensorflow/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1064, in run_one
domain = pickle.loads(blob)
AttributeError: 'module' object has no attribute 'minMe'
INFO:hyperopt.mongoexp:PROTOCOL mongo
INFO:hyperopt.mongoexp:USERNAME None
INFO:hyperopt.mongoexp:HOSTNAME localhost
INFO:hyperopt.mongoexp:PORT 1234
INFO:hyperopt.mongoexp:PATH /foo_db/jobs
INFO:hyperopt.mongoexp:DB foo_db
INFO:hyperopt.mongoexp:COLLECTION jobs
INFO:hyperopt.mongoexp:PASS None
INFO:hyperopt.mongoexp:no job found, sleeping for 0.7s
INFO:hyperopt.mongoexp:Error while unpickling. Try installing dill via "pip install dill" for enhanced pickling support.
INFO:hyperopt.mongoexp:job exception: 'module' object has no attribute 'minMe'
Traceback (most recent call last):
File "/Users/WernerChao/tensorflow/bin/hyperopt-mongo-worker", line 6, in <module>
sys.exit(hyperopt.mongoexp.main_worker())
File "/Users/WernerChao/tensorflow/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1302, in main_worker
return main_worker_helper(options, args)
File "/Users/WernerChao/tensorflow/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1249, in main_worker_helper
mworker.run_one(reserve_timeout=float(options.reserve_timeout))
File "/Users/WernerChao/tensorflow/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1064, in run_one
domain = pickle.loads(blob)
AttributeError: 'module' object has no attribute 'minMe'
INFO:hyperopt.mongoexp:exiting with N=9223372036854775803 after 4 consecutive exceptions
Run Code Online (Sandbox Code Playgroud)
6)然后Mongo工人会关闭.
我试过的事情:
有没有人有类似的问题?我已经没有想法尝试了,并且已经在这2天没有工作了.我想我错过了一些非常简单的东西,似乎无法找到它.我错过了什么?欢迎任何建议!
小智 8
在Python 3.5中有同样的问题.安装Dill没有帮助,也没有在MongoTrials或hyperopt-mongo-worker cli中设置workdir.hyperopt-mongo-worker似乎无法访问__main__定义函数的位置:
AttributeError: Can't get attribute 'minMe' on <module '__main__' from ...hyperopt-mongo-worker
Run Code Online (Sandbox Code Playgroud)
正如@jaikumarm建议的那样,我通过编写具有所有必需功能的模块文件来规避问题.但是,我没有将其软链接到bin目录中,而是PYTHONPATH在运行之前扩展了hyperopt-mongo-worker:
export PYTHONPATH="${PYTHONPATH}:<dir_with_the_module.py>"
hyperopt-mongo-worker ...
Run Code Online (Sandbox Code Playgroud)
这样,hyperopt-monogo-worker就可以导入包含的模块minMe.
小智 5
在提出可行的解决方案之前,我为此进行了几天的努力。存在两个问题:1. mongo worker产生了一个单独的进程来运行优化器,因此原始python文件中的所有上下文都将丢失并且无法用于此新进程。2.在此新过程中的导入是在hyperopt-mongo-worker scipy的上下文中进行的,在您的情况下为/ Users / WernerChao / tensorflow / bin /。
所以我的解决方案是使这个新的优化器功能完全自给自足
优化器
import numpy as np
import pandas as pd
from sklearn.metrics import mean_absolute_error
# Preprocessing data
train_xg = pd.read_csv('train.csv')
n_train = len(train_xg)
print "Whole data set size: ", n_train
# Creating columns for features, and categorical features
features_col = [x for x in train_xg.columns if x not in ['id', 'loss', 'log_loss']]
cat_features_col = [x for x in train_xg.select_dtypes(include=['object']).columns if x not in ['id', 'loss', 'log_loss']]
for c in range(len(cat_features_col)):
train_xg[cat_features_col[c]] = train_xg[cat_features_col[c]].astype('category').cat.codes
# Use this to train random forest regressor
train_xg_x = np.array(train_xg[features_col])
train_xg_y = np.array(train_xg['loss'])
def minMe(params):
# Hyperopt tuning for hyperparameters
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestRegressor
from hyperopt import STATUS_OK
try:
import dill as pickle
print('Went with dill')
except ImportError:
import pickle
def hyperopt_rf(params):
rf = RandomForestRegressor(**params)
return cross_val_score(rf, train_xg_x, train_xg_y).mean()
acc = hyperopt_rf(params)
print 'new acc:', acc, 'params: ', params
return {'loss': -acc, 'status': STATUS_OK}
Run Code Online (Sandbox Code Playgroud)
wrapper.py
from hyperopt import hp, fmin, tpe, STATUS_OK, Trials
from hyperopt.mongoexp import MongoTrials
import optimizer
space_rf = { 'min_samples_leaf': hp.choice('min_samples_leaf', range(1,100)) }
best = fmin(fn=optimizer.minMe, space=space_rf, trials=trials, algo=tpe.suggest, max_evals=100)
print "Best: ", best
trials = MongoTrials('mongo://localhost:1234/foo_db/jobs', exp_key='exp1')
Run Code Online (Sandbox Code Playgroud)
获得此代码后,将optimizer.py链接到bin文件夹
ln -s /Users/WernerChao/Git/test/optimizer.py /Users/WernerChao/tensorflow/bin/
Run Code Online (Sandbox Code Playgroud)
现在运行wrapper.py,然后运行mongo worker,它应该能够从其本地上下文导入优化器并运行minMe函数。
尝试在你的tensorflow(或者可能是worker)的Python环境中安装Dill :
/Users/WernerChao/tensorflow/lib/python2.7/site-packages/hyperopt
Run Code Online (Sandbox Code Playgroud)
您的目标是消除 hyperopt 错误消息:
hyperopt.mongoexp:Error while unpickling. Try installing dill via "pip install dill" for enhanced pickling support.
Run Code Online (Sandbox Code Playgroud)
这是因为 Python 默认情况下无法封送函数。它需要dill库来扩展 Python 的 pickling 模块以序列化/反序列化 Python 对象。就您而言,它无法序列化您的 function minMe()。