Vic*_*zzi 0 python performance parsing
这个文件每行有一个单词和成千上万的浮点数,我想将它转换为一个字典,其中单词为key,向量为所有浮点数.这就是我正在做的事情,但是由于文件的大小(大约20k行,每行约10k值),这个过程花费的时间太长了.我找不到更有效的解析方法.只是一些不能保证减少运行时间的替代方法.
with open("googlenews.word2vec.300d.txt") as g_file:
i = 0;
#dict of words: [lots of floats]
google_words = {}
for line in g_file:
google_words[line.split()[0]] = [float(line.split()[i]) for i in range(1, len(line.split()))]
Run Code Online (Sandbox Code Playgroud)
在你的解决方案中,你line.split()为每个单词做两次预制.考虑以下修改:
with open("googlenews.word2vec.300d.txt") as g_file:
i = 0;
#dict of words: [lots of floats]
google_words = {}
for line in g_file:
word, *numbers = line.split()
google_words[word] = [float(number) for number in numbers]
Run Code Online (Sandbox Code Playgroud)
我在这里使用的一个高级概念是"拆包":
word, *numbers = line.split()
Python允许将可迭代值解包为多个变量:
a, b, c = [1, 2, 3]
# This is practically equivalent to
a = 1
b = 2
c = 3
Run Code Online (Sandbox Code Playgroud)
这*是"获取剩余物,将它们放入并将list列表分配给名称" 的快捷方式:
a, *rest = [1, 2, 3, 4]
# results in
a == 1
rest == [2, 3, 4]
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
68 次 |
| 最近记录: |