计算词典列表中的条目:for loop with list comprehension with map(itemgetter)

Question

计算词典列表中的条目:for loop with list comprehension with map(itemgetter)

Pau*_*ce. 4 python dictionary loops list-comprehension map

在我编写的Python程序中,我使用for循环和增量变量与列表理解进行比较,map(itemgetter)并len()计算列表中字典中的条目.使用每种方法需要相同的时间.我做错了什么还是有更好的方法？

这是一个大大简化和缩短的数据结构:

list = [
  {'key1': True, 'dontcare': False, 'ignoreme': False, 'key2': True, 'filenotfound': 'biscuits and gravy'},
  {'key1': False, 'dontcare': False, 'ignoreme': False, 'key2': True, 'filenotfound': 'peaches and cream'},
  {'key1': True, 'dontcare': False, 'ignoreme': False, 'key2': False, 'filenotfound': 'Abbott and Costello'},
  {'key1': False, 'dontcare': False, 'ignoreme': True, 'key2': False, 'filenotfound': 'over and under'},
  {'key1': True, 'dontcare': True, 'ignoreme': False, 'key2': True, 'filenotfound': 'Scotch and... well... neat, thanks'}
]

Run Code Online (Sandbox Code Playgroud)

这是for循环版本:

#!/usr/bin/env python
# Python 2.6
# count the entries where key1 is True
# keep a separate count for the subset that also have key2 True

key1 = key2 = 0
for dictionary in list:
    if dictionary["key1"]:
        key1 += 1
        if dictionary["key2"]:
            key2 += 1
print "Counts: key1: " + str(key1) + ", subset key2: " + str(key2)

Run Code Online (Sandbox Code Playgroud)

上述数据的输出:

Counts: key1: 3, subset key2: 2

Run Code Online (Sandbox Code Playgroud)

这是另一个,也许更Pythonic,版本:

#!/usr/bin/env python
# Python 2.6
# count the entries where key1 is True
# keep a separate count for the subset that also have key2 True
from operator import itemgetter
KEY1 = 0
KEY2 = 1
getentries = itemgetter("key1", "key2")
entries = map(getentries, list)
key1 = len([x for x in entries if x[KEY1]])
key2 = len([x for x in entries if x[KEY1] and x[KEY2]])
print "Counts: key1: " + str(key1) + ", subset key2: " + str(key2)

Run Code Online (Sandbox Code Playgroud)

上述数据的输出(与之前相同):

Counts: key1: 3, subset key2: 2

Run Code Online (Sandbox Code Playgroud)

我有点惊讶这些花费相同的时间.我想知道是否有更快的东西.我确定我忽视了一些简单的事情.

我考虑过的一个替代方案是将数据加载到数据库中并进行SQL查询,但数据不需要持久存在,我必须分析数据传输的开销等,并且数据库可能并不总是能得到的.

我无法控制数据的原始形式.

^{_{上面的代码不适用于样式点.}}

Answer 1

Ale*_*lli 12

我认为你通过在很多开销中淹没要测量的代码来测量错误(在顶层模块级而不是在函数中运行,执行输出).把两个片段为命名函数forloop和withmap,并添加* 100到列表中的定义(截止后])进行测量有点实质性的,我明白了,在我的笔记本电脑速度慢:

$ py26 -mtimeit -s'import co' 'co.forloop()'
10000 loops, best of 3: 202 usec per loop
$ py26 -mtimeit -s'import co' 'co.withmap()'
10 loops, best of 3: 601 usec per loop

Run Code Online (Sandbox Code Playgroud)

也就是说,所谓的"更加pythonic"的方法map比简单的for方法慢三倍- 它告诉你它不是真的"更pythonic";-).

好的Python的标志是简单性,对我来说,推荐我骄傲的名字......:

def thebest():
  entries = [d['key2'] for d in list if d['key1']]
  return len(entries), sum(entries)

Run Code Online (Sandbox Code Playgroud)

在测量时,可以节省10%到20%的时间forloop.

@Alex:哇!`for i in range(10000000):a = i`在顶层比在函数中长50%.谢谢!通常情况下我会使用函数,但我只是在我的问题中发布了我认为是直接测试代码的东西(并且手动将其称为"不是样式点").正如他们所说,"你每天都学到新东西". (2认同)

归档时间：	15 年，7 月前
查看次数：	6579 次
最近记录：	15 年，7 月前