小编gun*_*gor的帖子

为什么'new_file + = line + string'比'new_file = new_file + line + string'快得多？

当我们使用时,我们的代码需要10分钟来虹吸68,000条记录:

new_file = new_file + line + string

Run Code Online (Sandbox Code Playgroud)

但是,当我们执行以下操作时,只需1秒钟:

new_file += line + string

Run Code Online (Sandbox Code Playgroud)

这是代码:

for line in content:
import time
import cmdbre

fname = "STAGE050.csv"
regions = cmdbre.regions
start_time = time.time()
with open(fname) as f:
        content = f.readlines()
        new_file_content = ""
        new_file = open("CMDB_STAGE060.csv", "w")
        row_region = ""
        i = 0
        for line in content:
                if (i==0):
                        new_file_content = line.strip() + "~region" + "\n"
                else:
                        country = line.split("~")[13]
                        try:
                                row_region = regions[country]
                        except KeyError:
                                row_region = "Undetermined"
                        new_file_content += …

Run Code Online (Sandbox Code Playgroud)

python string cpython string-concatenation python-internals

gun*_*gor

2016 12-07

8
推荐指数

2
解决办法

1108
查看次数

与dict()相比,Python OrderDict溅射

这个让我感到困惑.

asset_hist = []
for key_host, val_hist_list in am_output.asset_history.items():
    for index, hist_item in enumerate(val_hist_list):
        #row = collections.OrderedDict([("computer_name", key_host), ("id", index), ("hist_item", hist_item)])
        row = {"computer_name": key_host, "id": index, "hist_item": hist_item}
        asset_hist.append(row)

Run Code Online (Sandbox Code Playgroud)

此代码与注释掉的集合行完美配合.但是,当我注释掉row = dict行并从集合行中删除注释时,事情变得非常奇怪.这些行中大约有400万个生成并附加到asset_hist.

因此,当我使用row = dict时,整个循环在大约10毫秒内完成,闪电般快速.当我使用有序词典时,我等了10多分钟,但仍然没有完成.现在,我知道OrderDict应该比dict慢一点,但它应该在最坏的情况下慢大约10倍,而我的数学实际上它在这个函数中慢了大约100,000倍.

我决定在最低的循环中打印索引,看看发生了什么.有趣的是,我注意到控制台输出中的溅射.索引将在屏幕上快速打印,然后停止约3-5秒,然后继续.

am_output.asset_history是一个字典,它有一个键,主机,每一行都是一个字符串列表.例如

am_output.asset_history = {"host1":["string1","string2",...],"host2":["string1","string2",...],...}

编辑:使用OrderedDict进行溅射分析

此VM服务器上的总内存:仅8GB ...需要更多的提供.

LOOP NUM

184796(约5秒等待,约60%内存使用)

634481(等待约5秒,内存使用率约为65%)

1197564(约5秒等待,约70%内存使用)