Pickle的Hyperopt mongotrials问题:AttributeError:'module'对象没有属性

wer*_*hao 7 mongodb python-2.7

我正在尝试使用与MongoDB的Hyperopt并行搜索,并遇到Mongotrials的一些问题,这里已经讨论过.我已经尝试了所有方法,但仍然无法找到解决我特定问题的方法.我试图最小化的具体模型是来自sklearn的RadomForestRegressor.

我已经按照本教程.而且我能够打印出计算出的"fmin"而没有任何问题.

这是我到目前为止的步骤:

1)激活一个名为"tensorflow"的虚拟环境(我已在那里安装了所有库)

2)启动MongoDB:

(tensorflow) bash-3.2$ mongod --dbpath . --port 1234 --directoryperdb --journal --nohttpinterface
Run Code Online (Sandbox Code Playgroud)

3)启动工人:

(tensorflow) bash-3.2$ hyperopt-mongo-worker --mongo=localhost:1234/foo_db --poll-interval=0.1
Run Code Online (Sandbox Code Playgroud)

4)运行我的python代码,我的python代码如下:

import numpy as np
import pandas as pd

from sklearn.metrics import mean_absolute_error

from hyperopt import hp, fmin, tpe, STATUS_OK, Trials
from hyperopt.mongoexp import MongoTrials


# Preprocessing data
train_xg = pd.read_csv('train.csv')
n_train = len(train_xg)
print "Whole data set size: ", n_train

# Creating columns for features, and categorical features
features_col = [x for x in train_xg.columns if x not in ['id', 'loss', 'log_loss']]
cat_features_col = [x for x in train_xg.select_dtypes(include=['object']).columns if x not in ['id', 'loss', 'log_loss']]
for c in range(len(cat_features_col)):
    train_xg[cat_features_col[c]] = train_xg[cat_features_col[c]].astype('category').cat.codes

# Use this to train random forest regressor
train_xg_x = np.array(train_xg[features_col])
train_xg_y = np.array(train_xg['loss'])


space_rf = { 'min_samples_leaf': hp.choice('min_samples_leaf', range(1,100)) }

trials = MongoTrials('mongo://localhost:1234/foo_db/jobs', exp_key='exp1')

def minMe(params):
    # Hyperopt tuning for hyperparameters
    from sklearn.model_selection import cross_val_score
    from sklearn.ensemble import RandomForestRegressor
    from hyperopt import STATUS_OK

    try:
        import dill as pickle
        print('Went with dill')
    except ImportError:
        import pickle

    def hyperopt_rf(params):
        rf = RandomForestRegressor(**params)
        return cross_val_score(rf, train_xg_x, train_xg_y).mean()

    acc = hyperopt_rf(params)
    print 'new acc:', acc, 'params: ', params
    return {'loss': -acc, 'status': STATUS_OK}

best = fmin(fn=minMe, space=space_rf, trials=trials, algo=tpe.suggest, max_evals=100)
print "Best: ", best
Run Code Online (Sandbox Code Playgroud)

5)运行上面的Python代码后,我收到以下错误:

INFO:hyperopt.mongoexp:Error while unpickling. Try installing dill via "pip install dill" for enhanced pickling support.
INFO:hyperopt.mongoexp:job exception: 'module' object has no attribute 'minMe'
Traceback (most recent call last):
  File "/Users/WernerChao/tensorflow/bin/hyperopt-mongo-worker", line 6, in <module>
    sys.exit(hyperopt.mongoexp.main_worker())
  File "/Users/WernerChao/tensorflow/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1302, in main_worker
    return main_worker_helper(options, args)
  File "/Users/WernerChao/tensorflow/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1249, in main_worker_helper
    mworker.run_one(reserve_timeout=float(options.reserve_timeout))
  File "/Users/WernerChao/tensorflow/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1064, in run_one
    domain = pickle.loads(blob)
AttributeError: 'module' object has no attribute 'minMe'
INFO:hyperopt.mongoexp:PROTOCOL mongo
INFO:hyperopt.mongoexp:USERNAME None
INFO:hyperopt.mongoexp:HOSTNAME localhost
INFO:hyperopt.mongoexp:PORT 1234
INFO:hyperopt.mongoexp:PATH /foo_db/jobs
INFO:hyperopt.mongoexp:DB foo_db
INFO:hyperopt.mongoexp:COLLECTION jobs
INFO:hyperopt.mongoexp:PASS None
INFO:hyperopt.mongoexp:Error while unpickling. Try installing dill via "pip install dill" for enhanced pickling support.
INFO:hyperopt.mongoexp:job exception: 'module' object has no attribute 'minMe'
Traceback (most recent call last):
  File "/Users/WernerChao/tensorflow/bin/hyperopt-mongo-worker", line 6, in <module>
    sys.exit(hyperopt.mongoexp.main_worker())
  File "/Users/WernerChao/tensorflow/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1302, in main_worker
    return main_worker_helper(options, args)
  File "/Users/WernerChao/tensorflow/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1249, in main_worker_helper
    mworker.run_one(reserve_timeout=float(options.reserve_timeout))
  File "/Users/WernerChao/tensorflow/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1064, in run_one
    domain = pickle.loads(blob)
AttributeError: 'module' object has no attribute 'minMe'
INFO:hyperopt.mongoexp:PROTOCOL mongo
INFO:hyperopt.mongoexp:USERNAME None
INFO:hyperopt.mongoexp:HOSTNAME localhost
INFO:hyperopt.mongoexp:PORT 1234
INFO:hyperopt.mongoexp:PATH /foo_db/jobs
INFO:hyperopt.mongoexp:DB foo_db
INFO:hyperopt.mongoexp:COLLECTION jobs
INFO:hyperopt.mongoexp:PASS None
INFO:hyperopt.mongoexp:Error while unpickling. Try installing dill via "pip install dill" for enhanced pickling support.
INFO:hyperopt.mongoexp:job exception: 'module' object has no attribute 'minMe'
Traceback (most recent call last):
  File "/Users/WernerChao/tensorflow/bin/hyperopt-mongo-worker", line 6, in <module>
    sys.exit(hyperopt.mongoexp.main_worker())
  File "/Users/WernerChao/tensorflow/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1302, in main_worker
    return main_worker_helper(options, args)
  File "/Users/WernerChao/tensorflow/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1249, in main_worker_helper
    mworker.run_one(reserve_timeout=float(options.reserve_timeout))
  File "/Users/WernerChao/tensorflow/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1064, in run_one
    domain = pickle.loads(blob)
AttributeError: 'module' object has no attribute 'minMe'
INFO:hyperopt.mongoexp:PROTOCOL mongo
INFO:hyperopt.mongoexp:USERNAME None
INFO:hyperopt.mongoexp:HOSTNAME localhost
INFO:hyperopt.mongoexp:PORT 1234
INFO:hyperopt.mongoexp:PATH /foo_db/jobs
INFO:hyperopt.mongoexp:DB foo_db
INFO:hyperopt.mongoexp:COLLECTION jobs
INFO:hyperopt.mongoexp:PASS None
INFO:hyperopt.mongoexp:no job found, sleeping for 0.7s
INFO:hyperopt.mongoexp:Error while unpickling. Try installing dill via "pip install dill" for enhanced pickling support.
INFO:hyperopt.mongoexp:job exception: 'module' object has no attribute 'minMe'
Traceback (most recent call last):
  File "/Users/WernerChao/tensorflow/bin/hyperopt-mongo-worker", line 6, in <module>
    sys.exit(hyperopt.mongoexp.main_worker())
  File "/Users/WernerChao/tensorflow/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1302, in main_worker
    return main_worker_helper(options, args)
  File "/Users/WernerChao/tensorflow/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1249, in main_worker_helper
    mworker.run_one(reserve_timeout=float(options.reserve_timeout))
  File "/Users/WernerChao/tensorflow/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1064, in run_one
    domain = pickle.loads(blob)
AttributeError: 'module' object has no attribute 'minMe'
INFO:hyperopt.mongoexp:exiting with N=9223372036854775803 after 4 consecutive exceptions
Run Code Online (Sandbox Code Playgroud)

6)然后Mongo工人会关闭.

我试过的事情:

  • 安装"莳萝"作为建议的错误 - >不起作用
  • 将全局导入放入目标函数中以便它可以发泡 - >不起作用
  • 把尝试除了"莳萝"或"泡菜"作为导入 - >不起作用

有没有人有类似的问题?我已经没有想法尝试了,并且已经在这2天没有工作了.我想我错过了一些非常简单的东西,似乎无法找到它.我错过了什么?欢迎任何建议!

小智 8

在Python 3.5中有同样的问题.安装Dill没有帮助,也没有在MongoTrials或hyperopt-mongo-worker cli中设置workdir.hyperopt-mongo-worker似乎无法访问__main__定义函数的位置:

AttributeError: Can't get attribute 'minMe' on <module '__main__' from ...hyperopt-mongo-worker
Run Code Online (Sandbox Code Playgroud)

正如@jaikumarm建议的那样,我通过编写具有所有必需功能的模块文件来规避问题.但是,我没有将其软链接到bin目录中,而是PYTHONPATH在运行之前扩展了hyperopt-mongo-worker:

export PYTHONPATH="${PYTHONPATH}:<dir_with_the_module.py>"
hyperopt-mongo-worker ...
Run Code Online (Sandbox Code Playgroud)

这样,hyperopt-monogo-worker就可以导入包含的模块minMe.


小智 5

在提出可行的解决方案之前,我为此进行了几天的努力。存在两个问题:1. mongo worker产生了一个单独的进程来运行优化器,因此原始python文件中的所有上下文都将丢失并且无法用于此新进程。2.在此新过程中的导入是在hyperopt-mongo-worker scipy的上下文中进行的,在您的情况下为/ Users / WernerChao / tensorflow / bin /。

所以我的解决方案是使这个新的优化器功能完全自给自足

优化器

import numpy as np
import pandas as pd

from sklearn.metrics import mean_absolute_error

# Preprocessing data
train_xg = pd.read_csv('train.csv')
n_train = len(train_xg)
print "Whole data set size: ", n_train

# Creating columns for features, and categorical features
features_col = [x for x in train_xg.columns if x not in ['id', 'loss', 'log_loss']]
cat_features_col = [x for x in train_xg.select_dtypes(include=['object']).columns if x not in ['id', 'loss', 'log_loss']]
for c in range(len(cat_features_col)):
    train_xg[cat_features_col[c]] = train_xg[cat_features_col[c]].astype('category').cat.codes

# Use this to train random forest regressor
train_xg_x = np.array(train_xg[features_col])
train_xg_y = np.array(train_xg['loss'])



def minMe(params):
    # Hyperopt tuning for hyperparameters
    from sklearn.model_selection import cross_val_score
    from sklearn.ensemble import RandomForestRegressor
    from hyperopt import STATUS_OK

    try:
        import dill as pickle
        print('Went with dill')
    except ImportError:
        import pickle

    def hyperopt_rf(params):
        rf = RandomForestRegressor(**params)
        return cross_val_score(rf, train_xg_x, train_xg_y).mean()

    acc = hyperopt_rf(params)
    print 'new acc:', acc, 'params: ', params
    return {'loss': -acc, 'status': STATUS_OK}
Run Code Online (Sandbox Code Playgroud)

wrapper.py

from hyperopt import hp, fmin, tpe, STATUS_OK, Trials
from hyperopt.mongoexp import MongoTrials

import optimizer

space_rf = { 'min_samples_leaf': hp.choice('min_samples_leaf', range(1,100)) }
best = fmin(fn=optimizer.minMe, space=space_rf, trials=trials, algo=tpe.suggest, max_evals=100)
print "Best: ", best

trials = MongoTrials('mongo://localhost:1234/foo_db/jobs', exp_key='exp1')
Run Code Online (Sandbox Code Playgroud)

获得此代码后,将optimizer.py链接到bin文件夹

ln -s /Users/WernerChao/Git/test/optimizer.py /Users/WernerChao/tensorflow/bin/
Run Code Online (Sandbox Code Playgroud)

现在运行wrapper.py,然后运行mongo worker,它应该能够从其本地上下文导入优化器并运行minMe函数。


Wan*_*iar 0

尝试在你的tensorflow(或者可能是worker)的Python环境中安装Dill :

/Users/WernerChao/tensorflow/lib/python2.7/site-packages/hyperopt
Run Code Online (Sandbox Code Playgroud)

您的目标是消除 hyperopt 错误消息:

hyperopt.mongoexp:Error while unpickling. Try installing dill via "pip install dill" for enhanced pickling support.
Run Code Online (Sandbox Code Playgroud)

这是因为 Python 默认情况下无法封送函数。它需要dill库来扩展 Python 的 pickling 模块以序列化/反序列化 Python 对象。就您而言,它无法序列化您的 function minMe()