oob*_*boo 4 python hashtable file
嘿.我有一个我想要记忆的功能,但它有太多可能的值.有没有方便的方法将值存储在文本文件中并从中读取?例如,在文本文件中存储预先计算的素数列表,最多10 ^ 9?我知道从文本文件中读取的速度很慢,但如果数据量非常大,则没有其他选择.谢谢!
Ale*_*lli 11
对于最多的素数列表10**9
,为什么需要哈希?KEYS会是什么?!听起来像是一个简单,直接的二进制文件的绝佳机会!根据素数定理,有关于10**9/ln(10**9)
这样的素数 - 即5000万或更少.每个素数为4个字节,仅为200 MB或更少 - 非常适合array.array("L")
其等方法fromfile
(请参阅文档).在许多情况下,你实际上可以将200 MB全部吸入内存中,但是,最坏的情况是,你可以得到一些(例如通过mmap和fromstring
方法array.array
),在那里进行二进制搜索(例如通过bisect)等等.
当你需要一个巨大的键值存储 - 千兆字节,而不是一个微不足道的200 MB! - ) - 我曾经推荐shelve
但是在巨大的货架(性能,可靠性等)令人不快的现实生活经验之后,我目前推荐一个相反,数据库引擎 - sqlite很好,附带Python,PostgreSQL甚至更好,非关系型,如CouchDB可以更好,等等.
小智 6
您可以使用shelve模块在文件中存储类似结构的字典.从Python文档:
import shelve
d = shelve.open(filename) # open -- file may get suffix added by low-level
# library
d[key] = data # store data at key (overwrites old data if
# using an existing key)
data = d[key] # retrieve a COPY of data at key (raise KeyError if no
# such key)
del d[key] # delete data stored at key (raises KeyError
# if no such key)
flag = d.has_key(key) # true if the key exists
klist = d.keys() # a list of all existing keys (slow!)
# as d was opened WITHOUT writeback=True, beware:
d['xx'] = range(4) # this works as expected, but...
d['xx'].append(5) # *this doesn't!* -- d['xx'] is STILL range(4)!
# having opened d without writeback=True, you need to code carefully:
temp = d['xx'] # extracts the copy
temp.append(5) # mutates the copy
d['xx'] = temp # stores the copy right back, to persist it
# or, d=shelve.open(filename,writeback=True) would let you just code
# d['xx'].append(5) and have it work as expected, BUT it would also
# consume more memory and make the d.close() operation slower.
d.close() # close it
Run Code Online (Sandbox Code Playgroud)