pandas.algos._return_false在CentOS上使用dill.dump_session导致PicklingError

Kor*_*rem 6 python pickle python-2.7 pandas dill

我有一个代码框架,涉及使用dill转储会话.这曾经工作得很好,直到我开始使用熊猫.以下代码在CentOS 6.5版上引发了PicklingError:

import pandas
import dill
dill.dump_session('x.dat')
Run Code Online (Sandbox Code Playgroud)

问题似乎源于pandas.algos.实际上,运行它来重现错误就足够了:

import pandas.algos
import dill
dill.dump_session('x.dat') / dill.dumps(pandas.algos)
Run Code Online (Sandbox Code Playgroud)

错误是pickle.PicklingError: Can't pickle <cyfunction lambda1 at 0x1df3050>: it's not found as pandas.algos.lambda1.

问题是,我的电脑上没有出现这个错误.它们都有相同版本的pandas(0.14.1),dill(0.2.1)和python(2.7.6).

看一下badobjects,我得到:

>>> dill.detect.badobjects(pandas.algos, depth = 1)
{'__builtins__': <module '__builtin__' (built-in)>, 
'_return_true': <cyfunction lambda2 at 0x1484d70>, 
'np': <module 'numpy' from '/usr/local/lib/python2.7/site-packages/numpy-1.8.2-py2.7-linux-x86_64.egg/numpy/__init__.pyc'>, 
'_return_false': <cyfunction lambda1 at 0x1484cc8>, 
'lib': <module 'pandas.lib' from '/home/talkr/.local/lib/python2.7/site-packages/pandas/lib.so'>}
Run Code Online (Sandbox Code Playgroud)

这似乎是由于pandas.algos两个OS-s(可能是不同的编译器?)的不同处理.在我的电脑,在这里dump_session是没有错误的,pandas.algos._return_false<cyfunction <lambda> at 0x06DD02A0>,虽然在CentOS它<cyfunction lambda1 at 0x1df3050>.为什么处理方式不同?

Mik*_*rns 5

我没看到你在mac上看到的东西.这是我看到的,使用相同版本的pandas.我确实看到你使用的是另一个版本dill.我正在使用github的版本.我将检查是否存在对保存模块或全局变量的调整,dill这可能对某些发行版产生了影响.

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas
>>> import dill
>>> dill.detect.trace(True)
>>> dill.dump_session('x.pkl')
M1: <module '__main__' (built-in)>
F2: <function _import_module at 0x1069ff140>
D2: <dict object at 0x106a0b280>
M2: <module 'dill' from '/Users/mmckerns/lib/python2.7/site-packages/dill-0.2.2.dev-py2.7.egg/dill/__init__.pyc'>
M2: <module 'pandas' from '/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/__init__.pyc'>
Run Code Online (Sandbox Code Playgroud)

这是我得到的pandas.algos,

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas.algos
>>> import dill
>>> dill.dumps(pandas.algos)
'\x80\x02cdill.dill\n_import_module\nq\x00U\x0cpandas.algosq\x01\x85q\x02Rq\x03.'
Run Code Online (Sandbox Code Playgroud)

这是我得到的pandas.algos._return_false:

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> import pandas.algos
>>> dill.dumps(pandas.algos._return_false)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/mmckerns/lib/python2.7/site-packages/dill-0.2.2.dev-py2.7.egg/dill/dill.py", line 180, in dumps
    dump(obj, file, protocol, byref, file_mode, safeio)
  File "/Users/mmckerns/lib/python2.7/site-packages/dill-0.2.2.dev-py2.7.egg/dill/dill.py", line 173, in dump
    pik.dump(obj)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 224, in dump
    self.save(obj)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 317, in save
    self.save_global(obj, rv)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 748, in save_global
    (obj, module, name))
pickle.PicklingError: Can't pickle <cyfunction lambda1 at 0x10d403cc8>: it's not found as pandas.algos.lambda1
Run Code Online (Sandbox Code Playgroud)

所以,我现在可以重现你的错误.

这看起来像一个不可打击的对象,基于它的构建方式.但是,它应该可以在模块内腌制......因为它适合我. 您似乎已经确定了您在CentOS上构建的对象pandas中看到的内容之间的区别.

查看pandas代码库,pandas.algos是一个pyx文件......所以就是这样cython.这是代码.

_return_false = lambda self, other: False
Run Code Online (Sandbox Code Playgroud)

如果在.py文件中,我知道它会序列化.我不知道生成的lambdas是如何dill工作的cython......(例如lambda cyfunction).

看起来有一个提交(https://github.com/pydata/pandas/commit/73c71dfca10012e25c829930508b5d6f7ccad5ff),其中_return_false将一个类移到了模块范围之外.你在CentOS和PC上都看到了吗?可能是因为不同发行版的v0.14.1被切断了略有不同的git版本......取决于你如何安装大熊猫.

显然,我可以lambda1通过尝试获取对象的来源来获取a ...对于lambda,如果它无法获取源,dill将按名称获取...并且显然它被命名为lambda1......即使它没有出现在.pyx文件.也许这是由于如何cython构建lambdas.

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas.algos
>>> import dill
>>> dill.source.importable(pandas.algos._return_false)
'from pandas import lambda1\n'
Run Code Online (Sandbox Code Playgroud)

差异可能来自cython......因为代码是从.pyxin中生成的pandas.你的版本是cython什么?我的是0.20.2.