相关疑难解决方法(0)

Python内存使用情况？在内存中加载大型词典

嘿所有,我在磁盘上有一个只有168MB的文件.它只是一个逗号分隔的单词列表,id这个单词可以是1-5个单词长.有650万行.我在python中创建了一个字典,将其加载到内存中,以便我可以根据该字列表搜索传入的文本.当python将其加载到内存中时,它会显示1.3 GB的RAM空间.知道为什么会这样吗？

所以让我们说我的word文件看起来像这样......

1,word1
2,word2
3,word3

Run Code Online (Sandbox Code Playgroud)

然后添加650万,然后我循环通过该文件并创建一个字典(python 2.6.1)

  def load_term_cache():
      """will load the term cache from our cached file instead of hitting mysql. If it didn't 
      preload into memory it would be 20+ million queries per process"""
      global cached_terms
      dumpfile = os.path.join(os.getenv("MY_PATH"), 'datafiles', 'baseterms.txt')
      f = open(dumpfile)
      cache = csv.reader(f)
      for term_id, term in cache:
          cached_terms[term] = term_id
      f.close()

Run Code Online (Sandbox Code Playgroud)

只是这样做会炸毁记忆.我查看活动监视器,它将内存固定到所有可用的高达1.5GB的RAM在我的笔记本电脑上它只是开始交换.有关如何使用python最有效地在内存中存储键/值对的任何想法？

谢谢

更新:我尝试使用anydb模块,在440万条记录之后它就死了,浮点数是自我尝试加载它以来经过的秒数

Run Code Online (Sandbox Code Playgroud)

你可以看到它运行得很好.每隔几秒插入200,000行,直到我撞到墙壁并且时间翻倍. …

python memory

Jam*_*mes

2010 02-06

30
推荐指数

3
解决办法

3万
查看次数

标签统计

memory ×1

python ×1

Python内存使用情况？在内存中加载大型词典

标签 统计

标签统计