Hen*_*ter 5 python persistence shelve multiprocessing
该shelve模块的文档根据限制做出以下声明:
搁置模块不支持对搁置对象的并发读/写访问。(多个同时读取访问是安全的。)
据我所知,这意味着只要我不尝试让多个进程一次写入单个架子,我就应该清楚。使用同一个架子作为只读缓存的多个进程应该是安全的。对?
显然不是。经过一番挣扎,我最终得到了一个测试用例,该用例在从架子上异步读取时似乎表现出一些非常糟糕的行为。以下脚本:
Shelf并用"i" : 2*ifori从 1 到 10填充它。生成进程以从架子文件中检索每个键的值,并报告是否检索到值。
import multiprocessing
import shelve
SHELF_FILE = 'test.shlf'
def store(key, obj):
db = shelve.open(SHELF_FILE, 'w')
db[key] = obj
db.close()
def load(key):
try:
db = shelve.open(SHELF_FILE, 'r')
n = db.get(key)
if n is not None:
print('Got result {} for key {}'.format(n, key))
else:
print('NO RESULT for key {}'.format(key))
except Exception as e:
print('ERROR on key {}: {}'.format(key, e))
finally:
db.close()
if __name__ == '__main__':
db = shelve.open(SHELF_FILE, 'n') # Create brand-new shelf
db.close()
for i in range(1, 11): # populate the new shelf with keys from 1 to 10
store(str(i), i*2)
db = shelve.open(SHELF_FILE, 'r') # Make sure everything got in there.
print(', '.join(key for key in db)) # Should print 1-10 in some order
db.close()
# read each key's value from the shelf, asynchronously
pool = multiprocessing.Pool()
for i in range(1, 11):
pool.apply_async(load, [str(i)])
pool.close()
pool.join()
Run Code Online (Sandbox Code Playgroud)此处的预期输出自然会2, 4, 6, 8达到 20(按某种顺序)。相反,无法从架子上检索任意值,有时请求会导致shelve完全爆炸。实际输出如下所示:(“NO RESULT”行表示返回的键None):
import multiprocessing
import shelve
SHELF_FILE = 'test.shlf'
def store(key, obj):
db = shelve.open(SHELF_FILE, 'w')
db[key] = obj
db.close()
def load(key):
try:
db = shelve.open(SHELF_FILE, 'r')
n = db.get(key)
if n is not None:
print('Got result {} for key {}'.format(n, key))
else:
print('NO RESULT for key {}'.format(key))
except Exception as e:
print('ERROR on key {}: {}'.format(key, e))
finally:
db.close()
if __name__ == '__main__':
db = shelve.open(SHELF_FILE, 'n') # Create brand-new shelf
db.close()
for i in range(1, 11): # populate the new shelf with keys from 1 to 10
store(str(i), i*2)
db = shelve.open(SHELF_FILE, 'r') # Make sure everything got in there.
print(', '.join(key for key in db)) # Should print 1-10 in some order
db.close()
# read each key's value from the shelf, asynchronously
pool = multiprocessing.Pool()
for i in range(1, 11):
pool.apply_async(load, [str(i)])
pool.close()
pool.join()
Run Code Online (Sandbox Code Playgroud)
根据错误消息,我的直觉是,可能外部资源(可能是.dir文件?)没有被适当地刷新到磁盘(或者它们可能被其他进程删除?)。即便如此,我还是希望进程在等待磁盘资源时会变慢,而不是这些“哦,我猜它不存在”或“你在说什么这甚至不是架子文件”的结果。坦率地说,我不希望对这些文件有任何写入,因为工作进程只使用只读连接......
是否有我遗漏的东西,或者shelve在多处理环境中完全无法使用?
这是 Windows 7 上的 Python 3.3 x64(如果结果相关)。
文档中有一条警告注释shelve.open():
打开持久字典。指定的文件名是基础数据库的基本文件名。副作用是,文件名中可能会添加扩展名,并且可能会创建多个文件。
尝试将预打开的搁置(而不是文件名)传递给池线程,并查看行为是否发生变化。也就是说,我没有 2.7、Win7-64 的重现(当然输出全乱了)。