我正在开发一个项目,我在列表中读取多达250000个项目或更多项目,并将每个项目作为哈希表的关键字进行转换.
sample_key = open("sample_file.txt").readlines()
sample_counter = [0] * (len(sample_key))
sample_hash = {sample.replace('\n', ''):counter for sample, counter in zip(sample_key, sample_counter)}
当代码len(sample_key)在1000-2000范围内时,此代码运行良好.Beyound认为它只是忽略处理任何进一步的数据.
有什么建议,我该如何处理这个大型列表数据?
PS:此外,如果有一种最佳方式来执行此任务(如直接读取哈希键条目),请建议.我是Python的新手.
您的文本文件可能有重复项,这将覆盖您的字典中的现有键(哈希表的python名称).您可以创建一组唯一的键,然后使用字典理解来填充字典.
sample_file.txt
a
b
c
c
Python代码
with open("sample_file.txt") as f:
    keys = set(line.strip() for line in f.readlines())
my_dict = {key: 1 for key in keys if key}
>>> my_dict
{'a': 1, 'b': 1, 'c': 1}
这是一个包含100万个长度为10的随机字母字符的实现.时间在不到半秒的时间内相对微不足道.
import string
import numpy as np
letter_map = {n: letter for n, letter in enumerate(string.ascii_lowercase, 1)}
long_alpha_list = ["".join([letter_map[number] for number in row]) + "\n" 
                   for row in np.random.random_integers(1, 26, (1000000, 10))]
>>> long_alpha_list[:5]
['mfeeidurfc\n',
 'njbfzpunzi\n',
 'yrazcjnegf\n',
 'wpuxpaqhhs\n',
 'fpncybprrn\n']
>>> len(long_alpha_list)
1000000
# Write list to file.
with open('sample_file.txt', 'wb') as f:
    f.writelines(long_alpha_list)
# Read them back into a dictionary per the method above.
with open("sample_file.txt") as f:
    keys = set(line.strip() for line in f.readlines())
>>> %%timeit -n 10
>>> my_dict = {key: 1 for key in keys if key}
10 loops, best of 3: 379 ms per loop
| 归档时间: | 
 | 
| 查看次数: | 4353 次 | 
| 最近记录: |