使用Python将函数缓存到基于版本的到期磁盘

Question

使用Python将函数缓存到基于版本的到期磁盘

Sha*_*ked 5 python git caching python-2.7

我想用以下规范缓存某些函数/方法的结果：

运行之间有效：解释程序死后，运行之间的高速缓存应保持不变，这意味着数据需要保存到磁盘。
基于函数版本的到期时间：只要函数未更改，缓存中的数据就应保持有效。如果功能更改，它将使数据无效。
目前，所有这些都是单线程在同一台计算机上发生的。在同一台机器上支持并发是一个“奖励”。

我知道有一些基于磁盘的缓存的缓存装饰器，但是它们的到期时间通常是基于时间的，这与我的需求无关。

我考虑过使用Git commit SHA检测功能/类版本，但是问题是在同一文件中有多个功能/类。我需要一种方法来检查文件的特定功能/类段是否已更改。

我认为解决方案将包括版本管理和缓存的组合，但是我不太熟悉如何优雅地解决此问题。

例：

#file a.py
@cache_by_version
def f(a,b):
    #...

@cache_by_version
def g(a,b):
    #...

#file b.py
from a import *
def main():
    f(1,2)

Run Code Online (Sandbox Code Playgroud)

运行main文件b.py，一旦应导致的结果的缓存f与参数1和2磁盘。main再次运行应从缓存中获取结果，而无需f(1,2)再次评估。但是，如果f更改，则缓存应无效。另一方面，如果g更改，则不应影响的缓存f。

Answer 1

Joh*_*ery 3

好吧，经过一番混乱之后，这里的大部分工作都有效：


import os
import hashlib
import pickle
from functools import wraps
import inspect

# just cache in a "cache" directory within current working directory
# also using pickle, but there are other caching libraries out there
# that might be more useful
__cache_dir__ = os.path.join(os.path.abspath(os.getcwd()), 'cache')


def _read_from_cache(cache_key):
    cache_file = os.path.join(__cache_dir__, cache_key)
    if os.path.exists(cache_file):
        with open(cache_file, 'rb') as f:
            return pickle.load(f)
    return None


def _write_to_cache(cache_key, value):
    cache_file = os.path.join(__cache_dir__, cache_key)
    if not os.path.exists(__cache_dir__):
        os.mkdir(__cache_dir__)
    with open(cache_file, 'wb') as f:
        pickle.dump(value, f)


def cache_result(fn):
    @wraps(fn)
    def _decorated(*arg, **kw):
        m = hashlib.md5()
        fn_src = inspect.getsourcelines(fn)
        m.update(str(fn_src))
        # generated different key based on arguments too
        m.update(str(arg)) # possibly could do better job with arguments
        m.update(str(kw))
        cache_key = m.hexdigest()
        cached = _read_from_cache(cache_key)
        if cached is not None:
            return cached

        value = fn(*arg, **kw)
        _write_to_cache(cache_key, value)
        return value

    return _decorated


@cache_result
def add(a, b):
    print "Add called"
    return a + b


if __name__ == '__main__':
    print add(1, 2)

Run Code Online (Sandbox Code Playgroud)

我已经使用inspect.getsourcelines来读取函数代码并使用它来生成用于在缓存中查找的密钥（以及参数）。这意味着对函数的任何更改（甚至是空格）都将生成一个新的缓存键，并且需要调用该函数。

但请注意，如果该函数调用其他函数并且这些函数已更改，那么您仍然会获得原始的缓存结果。这可能是意想不到的。

因此，这可能适合用于大量数字或涉及大量网络活动的内容，但您可能会发现需要时不时地清除缓存目录。

使用 getsourcelines 的一个缺点是，如果您无权访问源代码，那么这将不起作用。我想对于大多数 Python 程序来说这不应该是一个太大的问题。

所以我会以此为起点，而不是作为一个完全可行的解决方案。

它还使用 pickle 来存储缓存的值 - 因此只有在您可以信任的情况下才可以安全使用。

归档时间：	10 年，1 月前
查看次数：	218 次
最近记录：	9 年，11 月前