Pet*_*one 206 python serialization object save pickle
我创建了一个像这样的对象:
company1.name = 'banana'
company1.value = 40
Run Code Online (Sandbox Code Playgroud)
我想保存这个对象.我怎样才能做到这一点?
mar*_*eau 406
您可以pickle
在标准库中使用该模块.以下是它的示例的基本应用:
import pickle
class Company(object):
def __init__(self, name, value):
self.name = name
self.value = value
with open('company_data.pkl', 'wb') as output:
company1 = Company('banana', 40)
pickle.dump(company1, output, pickle.HIGHEST_PROTOCOL)
company2 = Company('spam', 42)
pickle.dump(company2, output, pickle.HIGHEST_PROTOCOL)
del company1
del company2
with open('company_data.pkl', 'rb') as input:
company1 = pickle.load(input)
print(company1.name) # -> banana
print(company1.value) # -> 40
company2 = pickle.load(input)
print(company2.name) # -> spam
print(company2.value) # -> 42
Run Code Online (Sandbox Code Playgroud)
您还可以编写一个简单的实用程序,如下所示,它打开一个文件并向其写入一个对象:
def save_object(obj, filename):
with open(filename, 'wb') as output: # Overwrites any existing file.
pickle.dump(obj, output, pickle.HIGHEST_PROTOCOL)
# sample usage
save_object(company1, 'company1.pkl')
Run Code Online (Sandbox Code Playgroud)
由于这是一个非常受欢迎的答案,我想谈谈一些稍微高级的使用主题.
cPickle
(或_pickle
)vspickle
实际使用cPickle
模块几乎总是优先考虑,而不是pickle
因为前者是用C语言编写的,而且速度要快得多.它们之间存在一些细微差别,但在大多数情况下它们是等效的,C版本将提供极大的优越性能.切换到它可能不容易,只需将import
语句更改为:
import cPickle as pickle
Run Code Online (Sandbox Code Playgroud)
在Python 3中,cPickle
重命名了_pickle
,但是由于pickle
模块现在自动执行,所以不再需要这样做- 看看python 3中的pickle和_pickle有什么区别?.
您可以使用以下内容来确保您的代码在Python 2和3中都可用时始终使用C版本:
try:
import cPickle as pickle
except ModuleNotFoundError:
import pickle
Run Code Online (Sandbox Code Playgroud)
pickle
可以用几种不同的,特定于Python的格式(称为协议)来读写文件."协议版本0"是ASCII,因此是"人类可读的".版本> 1是二进制的,可用的最高版本取决于正在使用的Python版本.默认值还取决于Python版本.在Python 2中,默认为Protocol版本0
,但在Python 3.6中,它是Protocol版本4
.在Python 3.x中,模块pickle.DEFAULT_PROTOCOL
添加了一个,但在Python 2中不存在.
幸运的是,pickle.HIGHEST_PROTOCOL
在每个调用中都有写入的速记(假设这是你想要的,而且你通常这样做) - 只能使用文字数字-1
.所以,而不是写:
pickle.dump(obj, output, pickle.HIGHEST_PROTOCOL)
Run Code Online (Sandbox Code Playgroud)
你可以写:
pickle.dump(obj, output, -1)
Run Code Online (Sandbox Code Playgroud)
无论哪种方式,如果您创建了一个Pickler
用于多个pickle操作的对象,您只需指定一次协议:
pickler = pickle.Pickler(output, -1)
pickler.dump(obj1)
pickler.dump(obj2)
etc...
Run Code Online (Sandbox Code Playgroud)
虽然泡菜文件可以包含如上述样品中,当有这些数目不详的任何数量的腌制对象的,它往往更容易将其全部保存在某种可变大小的容器,就像一个list
,tuple
或dict
写它们都是一次调用的文件:
tech_companies = [
Company('Apple', 114.18), Company('Google', 908.60), Company('Microsoft', 69.18)
]
save_object(tech_companies, 'tech_companies.pkl')
Run Code Online (Sandbox Code Playgroud)
并使用以下命令恢复列表及其中的所有内容:
with open('tech_companies.pkl', 'rb') as input:
tech_companies = pickle.load(input)
Run Code Online (Sandbox Code Playgroud)
主要优点是您不需要知道保存了多少对象实例以便稍后加载它们(尽管这样做没有这些信息是可能的,它需要一些稍微专门的代码).查看相关问题的答案在pickle文件中保存和加载多个对象?有关不同方法的详细信息.就个人而言,我最喜欢@Lutz Prechelt的答案.这是适应这里的例子:
class Company:
def __init__(self, name, value):
self.name = name
self.value = value
def pickled_items(filename):
""" Unpickle a file of pickled data. """
with open(filename, "rb") as f:
while True:
try:
yield pickle.load(f)
except EOFError:
break
print('Companies in pickle file:')
for company in pickled_items('company_data.pkl'):
print(' name: {}, value: {}'.format(company.name, company.value))
Run Code Online (Sandbox Code Playgroud)
Mik*_*rns 47
我认为假设对象是一个非常强大的假设class
.如果它不是一个class
怎么办?还假设对象未在解释器中定义.如果在解释器中定义了怎么办?另外,如果动态添加属性怎么办?当一些python对象__dict__
在创建之后添加属性时,pickle
不尊重这些属性的添加(即它'忘记'它们被添加 - 因为pickle
序列化通过引用对象定义).
在所有这些情况,pickle
并且cPickle
可以可怕的失败你.
如果你想保存object
(任意创建),你有属性(在对象定义中添加,或之后)...你最好的选择是使用dill
,它可以序列化python中的几乎任何东西.
我们从课程开始......
Python 2.7.8 (default, Jul 13 2014, 02:29:54)
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> class Company:
... pass
...
>>> company1 = Company()
>>> company1.name = 'banana'
>>> company1.value = 40
>>> with open('company.pkl', 'wb') as f:
... pickle.dump(company1, f, pickle.HIGHEST_PROTOCOL)
...
>>>
Run Code Online (Sandbox Code Playgroud)
现在关闭,重启......
Python 2.7.8 (default, Jul 13 2014, 02:29:54)
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> with open('company.pkl', 'rb') as f:
... company1 = pickle.load(f)
...
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 1378, in load
return Unpickler(file).load()
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 858, in load
dispatch[key](self)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 1090, in load_global
klass = self.find_class(module, name)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 1126, in find_class
klass = getattr(mod, name)
AttributeError: 'module' object has no attribute 'Company'
>>>
Run Code Online (Sandbox Code Playgroud)
哎呀... pickle
无法处理它.我们来试试吧dill
.我们将抛出另一个对象类型(a lambda
)以获得良好的衡量标准.
Python 2.7.8 (default, Jul 13 2014, 02:29:54)
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> class Company:
... pass
...
>>> company1 = Company()
>>> company1.name = 'banana'
>>> company1.value = 40
>>>
>>> company2 = lambda x:x
>>> company2.name = 'rhubarb'
>>> company2.value = 42
>>>
>>> with open('company_dill.pkl', 'wb') as f:
... dill.dump(company1, f)
... dill.dump(company2, f)
...
>>>
Run Code Online (Sandbox Code Playgroud)
现在阅读文件.
Python 2.7.8 (default, Jul 13 2014, 02:29:54)
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> with open('company_dill.pkl', 'rb') as f:
... company1 = dill.load(f)
... company2 = dill.load(f)
...
>>> company1
<__main__.Company instance at 0x107909128>
>>> company1.name
'banana'
>>> company1.value
40
>>> company2.name
'rhubarb'
>>> company2.value
42
>>>
Run Code Online (Sandbox Code Playgroud)
有用.其原因pickle
失败,并且dill
不,是,dill
治疗__main__
等的模块(在大多数情况下),也可以通过参考腌制类定义而不是酸洗(像pickle
一样).dill
可以腌制a 的原因lambda
是它给它一个名字......然后酸洗魔法就会发生.
实际上,有一种更简单的方法来保存所有这些对象,特别是如果你创建了很多对象.只需转储整个python会话,稍后再回过头来.
Python 2.7.8 (default, Jul 13 2014, 02:29:54)
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> class Company:
... pass
...
>>> company1 = Company()
>>> company1.name = 'banana'
>>> company1.value = 40
>>>
>>> company2 = lambda x:x
>>> company2.name = 'rhubarb'
>>> company2.value = 42
>>>
>>> dill.dump_session('dill.pkl')
>>>
Run Code Online (Sandbox Code Playgroud)
现在关闭你的电脑,享用浓咖啡或其他什么,然后再回来......
Python 2.7.8 (default, Jul 13 2014, 02:29:54)
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> dill.load_session('dill.pkl')
>>> company1.name
'banana'
>>> company1.value
40
>>> company2.name
'rhubarb'
>>> company2.value
42
>>> company2
<function <lambda> at 0x1065f2938>
Run Code Online (Sandbox Code Playgroud)
唯一的主要缺点是它dill
不是python标准库的一部分.因此,如果您无法在服务器上安装python包,则无法使用它.
但是,如果你能够在系统上安装Python包,你可以得到最新的dill
带git+https://github.com/uqfoundation/dill.git@master#egg=dill
.你可以获得最新发布的版本pip install dill
.
使用company1
您的问题的快速示例,使用 python3。
import pickle
# Save the file
pickle.dump(company1, file = open("company1.pickle", "wb"))
# Reload the file
company1_reloaded = pickle.load(open("company1.pickle", "rb"))
Run Code Online (Sandbox Code Playgroud)
然而,正如这个答案所指出的,泡菜经常失败。所以你真的应该使用dill
.
import dill
# Save the file
dill.dump(company1, file = open("company1.pickle", "wb"))
# Reload the file
company1_reloaded = dill.load(open("company1.pickle", "rb"))
Run Code Online (Sandbox Code Playgroud)
您可以使用anycache来为您完成这项工作。它考虑了所有细节:
pickle
模块来处理lambda
和所有不错的 python 功能。myfunc
假设您有一个创建实例的函数:
from anycache import anycache
class Company(object):
def __init__(self, name, value):
self.name = name
self.value = value
@anycache(cachedir='/path/to/your/cache')
def myfunc(name, value)
return Company(name, value)
Run Code Online (Sandbox Code Playgroud)
myfunc
Anycache第一次调用并将结果腌制到文件中,cachedir
使用唯一标识符(取决于函数名称及其参数)作为文件名。在任何连续运行中,都会加载腌制的对象。如果cachedir
在 python 运行之间保留了 ,则 pickled 对象将从之前的 python 运行中获取。
有关更多详细信息,请参阅文档
归档时间: |
|
查看次数: |
205163 次 |
最近记录: |