从dicts中删除元素时,del或pop是首选项

Rob*_*Rob 5 python python-2.7

我对Python比较陌生,并且想知道从a中删除元素时是否有任何理由更喜欢这些方法之一dict

A)使用 del

# d is a dict, k is a key
if k in d:
   del k
Run Code Online (Sandbox Code Playgroud)

B)使用 pop

d.pop(k, None)
Run Code Online (Sandbox Code Playgroud)

我的第一个想法是方法(A)需要做两次查找 - 一次在if语句中,再一次在执行中del,这会使它稍微慢一点pop,只需要一次查找.然后一位同事指出del可能还有优势,因为它是一个关键字,因此可能会更好地优化,而pop最终用户可以替换这种方法(不确定这是否真的是一个因素,但他确实如此)有更多编写Python代码的经验).

我写了一些测试片段来比较性能.它看起来del有优势(如果有人关心尝试或评论正确性,我已经附加了片段).

所以,这让我回到了这样一个问题:除了边际绩效收益之外,是否有理由偏好一个而不是另一个?

以下是测试性能的片段:

天真的考验

import timeit
print 'in:   ', timeit.Timer(stmt='42 in d', setup='d = dict.fromkeys(range(100000))').timeit()
print 'pop:  ', timeit.Timer(stmt='d.pop(42,None)', setup='d = dict.fromkeys(range(100000))').timeit()
print 'del:  ', timeit.Timer(stmt='if 42 in d:\n    del d[42]', setup='d = dict.fromkeys(range(100000))').timeit()
Run Code Online (Sandbox Code Playgroud)

这输出

in:    0.0521960258484
pop:   0.172810077667
del:   0.0660231113434
Run Code Online (Sandbox Code Playgroud)

所以这是一个奇怪的结果.我原本预计pop会大致相同in,但它的价格要高出三倍多.另一个令人惊讶的是,del它只是稍微慢了in,直到我意识到timeit类中的setup语句中的字典仍然是同一个实例,所以只有第一个调用才会命中del语句,因为所有其他if语句都不会传递语句.

稍微不那么天真的测试

所以我写了一个更长的分析代码片段,试图避免这种情况.我timeit使用一些随机密钥选择运行几次运行,并尝试确保我们主要使用if语句和del语句(因此我们不会一直使用相同的字典实例):

#! /usr/bin/bash

import timeit

# Number of times to repeat fresh setup before doing timeit runs
repeat_num=100
# Number of timeit runs per setup
number=1000
# Size of dictionary for runs (smaller)
small_size=10000
# Size of dictionary for timeit runs (larger)
large_size=1000000
# Switches garbage collection on if True
collect_garbage = False

setup_stmt = """
import random
d = dict.fromkeys(range(%(dict_size)i))
# key, randomly chosen
k = random.randint(0,%(dict_size)i - 1)
%(garbage)s
"""

in_stmt = """
k in d
%(incr_k)s
""" % {'incr_k' : 'k = (k + 1) %% %(dict_size)i' if number > 1 else ''}

pop_stmt = """
d.pop(k, None)
%(incr_k)s
""" % {'incr_k' : 'k = (k + 1) %% %(dict_size)i' if number > 1 else ''}


del_stmt = """
if k in d:
    del d[k]
%(incr_k)s
""" % {'incr_k' : 'k = (k + 1) %% %(dict_size)i' if number > 1 else ''}

# Results for smaller dictionary size
print \
"""SETUP:
   repeats        : %(repeats)s
   runs per repeat: %(number)s
   garbage collect: %(garbage)s""" \
       % {'repeats' : repeat_num,
          'number'  : number,
          'garbage' : 'yes' if collect_garbage else 'no'}
print "SMALL:"
small_setup_stmt = setup_stmt % \
    {'dict_size' : small_size,
     'garbage' : 'gc.enable()' if collect_garbage else ''}
times = timeit.Timer(stmt=in_stmt % {'dict_size' : small_size},
    setup=small_setup_stmt).repeat(repeat=repeat_num,number=number)
print "    in:  ", sum(times)/len(times)
times = timeit.Timer(stmt=pop_stmt % {'dict_size' : small_size},
    setup=small_setup_stmt).repeat(repeat=repeat_num,number=number)
print "    pop: ", sum(times)/len(times)
times = timeit.Timer(stmt=del_stmt % {'dict_size' : small_size},
    setup=small_setup_stmt).repeat(repeat=repeat_num,number=number)
print "    del: ", sum(times)/len(times)

# Results for larger dictionary size
print "LARGE:"
large_setup_stmt = setup_stmt % \
    {'dict_size' : large_size,
     'garbage' : 'gc.enable()' if collect_garbage else ''}
times = timeit.Timer(stmt=in_stmt  % {'dict_size' : large_size},
    setup=large_setup_stmt).repeat(repeat=repeat_num,number=number)
print "    in:  ", sum(times)/len(times)
times = timeit.Timer(stmt=pop_stmt  % {'dict_size' : large_size},
    setup=large_setup_stmt).repeat(repeat=repeat_num,number=number)
print "    pop: ", sum(times)/len(times)
times = timeit.Timer(stmt=del_stmt  % {'dict_size' : large_size},
    setup=large_setup_stmt).repeat(repeat=repeat_num,number=number)
print "    del: ", sum(times)/len(times)
Run Code Online (Sandbox Code Playgroud)

进行100次设置,每次设置每次1000次,打印以下内容:

SETUP:
   repeats        : 100
   runs per repeat: 1000
   garbage collect: no
SMALL:
    in:   0.00020430803299
    pop:  0.000313355922699
    del:  0.000262062549591
LARGE:
    in:   0.000201721191406
    pop:  0.000328607559204
    del:  0.00027587890625
Run Code Online (Sandbox Code Playgroud)

我是新手timeit,所以这可能是一个有缺陷的测试,但它似乎表明del在性能方面有一个小优势.

我从这个练习中学到的一件事就是Python字典是哈希映射,因此字典的大小不会像C++一样影响查找时间std::map,例如(常数时间vs O)的log(n)) - ISH).那好吧.活到老,学到老.

Bre*_*arn 9

我不担心性能差异,除非你有特别的理由相信它们会导致你的程序出现明显的减速,这是不太可能的.

您可能选择使用delvs 的真正原因pop是因为它们具有不同的行为. pop返回弹出键的值,因此pop如果要在删除它的同时对该值执行某些操作,则可以使用该值.如果您不需要对值执行任何操作,但只想删除该项,请使用del.