使用装饰器来持久化python对象

Question

使用装饰器来持久化python对象

sha*_*nuo 9 python ipython-notebook jupyter-notebook

我从下面链接获得的代码,可以将数据保存到磁盘.

http://tohyongcheng.github.io/python/2016/06/07/persisting-a-cache-in-python-to-disk.html

我试了但是文件没有生成.

import atexit
import pickle
# or import cPickle as pickle

def persist_cache_to_disk(filename):
    def decorator(original_func):
        try:
            cache = pickle.load(open(filename, 'r'))
        except (IOError, ValueError):
            cache = {}

        atexit.register(lambda: pickle.dump(cache, open(filename, "w")))

        def new_func(*args):
            if tuple(args) not in cache:
                cache[tuple(args)] = original_func(*args)
            return cache[args]

        return new_func

    return decorator

Run Code Online (Sandbox Code Playgroud)

我尝试按照示例使用此代码...

@persist_cache_to_disk('users.p')
def get_all_users():
    x = 'some user'
    return x

Run Code Online (Sandbox Code Playgroud)

更新:

这是在python命令提示符下工作,但在ipython笔记本中不起作用.

Answer 1

rra*_*nza 11

问题是使用的示例atexit仅在python退出时运行转储例程.每次更新缓存时,此修改版本都将转储:

import atexit
import functools
import pickle
# or import cPickle as pickle

def persist_cache_to_disk(filename):
    def decorator(original_func):
        try:
            cache = pickle.load(open(filename, 'r'))
        except (IOError, ValueError):
            cache = {}

        # Your python script has to exit in order to run this line!
        # atexit.register(lambda: pickle.dump(cache, open(filename, "w")))
        #
        # Let's make a function and call it periodically:
        #
        def save_data():                                                        
            pickle.dump(cache, open(filename, "w"))  

        # You should wrap your func
        @functools.wraps(original_func)
        def new_func(*args):
            if tuple(args) not in cache:
                cache[tuple(args)] = original_func(*args)
                # Instead, dump your pickled data after
                # every call where the cache is changed.
                # This can be expensive!
                save_data()
            return cache[args]

        return new_func

    return decorator


@persist_cache_to_disk('users.p')
def get_all_users():
    x = 'some user'
    return x

get_all_users()

Run Code Online (Sandbox Code Playgroud)

如果你想限制保存,你可以修改save_data()为仅保存,比如,当它len(cache.keys())是100的倍数时.

我也加入functools.wraps了你的装饰师.来自文档:

如果不使用这个装饰器工厂,示例函数的名称将是'wrapper',原始example()的docstring将丢失.

Answer 2

lum*_*ric 5

最佳解决方案取决于用例.没有一般方法可以立即解决所有问题.

缓存数据

如果要加速函数调用,可能需要将结果缓存在内存中(因为磁盘读/写速度也很慢).如果要调用具有相同参数的函数,则自上次启动Python解释器以来的第一次调用将变慢.所有后续调用都将访问缓存(如果缓存足够大以存储所有结果).

用Python> = 3.2甚至还有一个内置的装饰器@functools.lru_cache(maxsize=100, typed=False):

Decorator用一个memoizing callable来包装一个函数,它可以节省maxsize最近的调用.当使用相同的参数定期调用昂贵的或I/O绑定函数时,它可以节省时间.

例:

@lru_cache(maxsize=32)
def get_pep(num):
    'Retrieve text of a Python Enhancement Proposal'
    resource = 'http://www.python.org/dev/peps/pep-%04d/' % num
    try:
        with urllib.request.urlopen(resource) as s:
            return s.read()
    except urllib.error.HTTPError:
        return 'Not Found'

>>> for n in 8, 290, 308, 320, 8, 218, 320, 279, 289, 320, 9991:
...     pep = get_pep(n)
...     print(n, len(pep))

>>> get_pep.cache_info()
CacheInfo(hits=3, misses=8, maxsize=32, currsize=8)

Run Code Online (Sandbox Code Playgroud)

在pypi上有一个Python 2.7的backport和cachetools包,它也兼容Python 2.7,并且还包含Python 3 @ functools.lru_cache函数装饰器的变体.

磁盘上的持久数据

如果要在Python进程完成后保留数据,则将数据存储在磁盘上是有意义的.这可能会加快第一个函数调用,但它可能会减慢所有其他函数调用,因为它需要读取和写入文件.

@ rrauenza的解决方案看起来不错.有一些小的改进:

import pickle
import functools
import collections
# or import cPickle as pickle

def persist_cache_to_disk(filename):
    def decorator(original_func):
        try:
            cache = pickle.load(open(filename, 'r'))
        except (IOError, ValueError):
            cache = {}

        def save_data():
            pickle.dump(cache, open(filename, "w"))

        @functools.wraps(original_func)
        def new_func(*args):
            try:
                try:
                    hash(args)
                except TypeError:
                    # do not use cache because we cannot hash args
                    return original_func(*args)

                if tuple(args) not in cache:
                    cache[tuple(args)] = original_func(*args)
                    # dump complete cache,  this can be expensive!
                    save_data()
                return cache[args]
        return new_func

    return decorator

Run Code Online (Sandbox Code Playgroud)

函数调用也在内存中缓存,类似于@ functools.lru_cache(),但它没有实现最大缓存大小(程序内存使用的潜在问题),也没有类似于该typed选项(见上文).

不幸的是搁置(由@Aya建议)不能直接使用,因为只支持字符串作为键.这应该带来更好的性能,因为它不需要在每次更新时都写入完整的缓存.

如果用例不是缓存,那么Pickle不是首选方法,而是在Python解释器启动之间存储数据.如果您必须更改腌制对象的类,则腌制文件将变得无用.在这种情况下可以清除缓存,但在其他情况下,请考虑使用yml,json或xml,或者如果您有大量数据,则使用某种二进制格式(例如hdf5).

陷阱

并非所有论据都可以播放

所有参数都必须是可清除的.例如,列表和词典不可清除.对此没有简单而通用的解决方案.仔细考虑需要支持哪种参数.列表可以轻松转换为元组.也适用于字典可以制作.不幸的是,这适用于上面的所有缓存方法(包括内置的@ functools.lru_cache).

并非所有返回值都可以腌制

数据需要序列化以存储在磁盘上.这通常通过使用pickle模块来完成.搁置也在内部使用泡菜.不幸的是,不是每个对象都可以腌制.如果函数包含不可选择的对象,您可以尝试使它们成为可选择的,或者选择不同的方式来序列化数据(以及用于存储序列化数据的不同文件格式).如果你使用numpy对象,numnpy.save()是一种非常快速的方法来存储大型数据.

丢失类型信息

对象可能相同,但不是同一类型.如果你的函数还取决于输入参数的类型,你可能会遇到麻烦:

@functools.lru_cache(typed=False)
def fun_with_numbers(a, b):
    return a/b, isinstance(3, float)

Run Code Online (Sandbox Code Playgroud)

该部门仅使用Python 2失败:

>>> fun_with_numbers(1, 3)
0, False
>>> fun_with_numbers(1., 3.)
0, False

Run Code Online (Sandbox Code Playgroud)

使用@ functools.lru_cache(),你可以通过设置来解决这个问题typed=True,但是如果你使用不同的缓存方法,你可能需要自己实现类似的东西.

函数不仅仅依赖于输入参数

由于显而易见的原因,该函数不应该依赖于非常量全局变量或其他外部参数.如果函数返回time.time(),它将始终返回第一个函数调用的缓存时间.

线程安全

如果您在没有正确锁定的情况下同时使用缓存函数,则会发生非常糟糕的事情.

你真的需要它吗？

您应该在添加缓存之前和之后进行性能分析.如果代码很快,缓存可能会降低代码速度.

归档时间：	9 年，5 月前
查看次数：	1197 次
最近记录：	9 年，5 月前