Python 2.6优化嵌套循环

use*_*386 2 python optimization

我有一个函数,它有一个字典作为输入和一个值n.字典中的每个项目都是具有一个或多个值的集合.该函数应对字典键进行排序,它应提取并返回"n"值.此功能将经常执行,因此我正在尝试优化它.有什么建议?

def select_items(temp_dict, n):
  """Select n items from the dictionary"""
  res = []
  sort_keys = sorted(temp_dict.keys())
  count = 0

  for key in sort_keys:
    for pair in temp_dict[key]:
      if count < n:
        res.append(pair)
        count += 1
      else:
        return res

  return res
Run Code Online (Sandbox Code Playgroud)

在这段代码中,我有一个count和"if语句"来控制所选值的数量.有没有办法通过在itertools或其他东西中使用某些函数来优化此代码?

Dav*_*ver 6

这是我的第一次尝试(参见参考资料select_items_faster),它的速度几乎翻了一番:

In [12]: print _11
import itertools

def select_items_original(temp_dict, n):
  """Select n items from the dictionary"""
  res = []
  sort_keys = sorted(temp_dict.keys())
  count = 0

  for key in sort_keys:
    for pair in temp_dict[key]:
      if count < n:
        res.append(pair)
        count += 1
      else:
        return res

  return res

def select_items_faster(temp_dict, n):
    """Select n items from the dictionary"""
    items = temp_dict.items()
    items.sort()

    return list(itertools.chain.from_iterable(val for (_, val) in itertools.islice(items, n)))

test_dict = dict((x, ["a"] * int(x / 500)) for x in range(1000))
test_n = 300


In [13]: %timeit select_items_original(test_dict, test_n)
1000 loops, best of 3: 293 us per loop


In [14]: %timeit select_items_faster(test_dict, test_n)
1000 loops, best of 3: 203 us per loop
Run Code Online (Sandbox Code Playgroud)

itertools.islice用a 取代[:n]并不能真正帮助你:

def select_items_faster_slice(temp_dict, n):
    """Select n items from the dictionary"""
    items = temp_dict.items()
    items.sort()

    return list(itertools.chain.from_iterable(val for (_, val) in items[:n]))

In [16]: %timeit select_items_faster_slice(test_dict, test_n)
1000 loops, best of 3: 210 us per loop
Run Code Online (Sandbox Code Playgroud)

而且sorted:

In [18]: %timeit select_items_faster_sorted(test_dict, test_n)
1000 loops, best of 3: 213 us per loop


In [19]: print _17
def select_items_faster_sorted(temp_dict, n):
    """Select n items from the dictionary"""
    return list(itertools.chain.from_iterable(val for (_, val) in itertools.islice(sorted(temp_dict.items()), n)))
Run Code Online (Sandbox Code Playgroud)

但是,组合map__getitem__更快:

In [22]: %timeit select_items_faster_map_getitem(test_dict, test_n)
10000 loops, best of 3: 90.7 us per loop

In [23]: print _20
def select_items_faster_map_getitem(temp_dict, n):
    """Select n items from the dictionary"""
    keys = temp_dict.keys()
    keys.sort()
    return list(itertools.chain.from_iterable(map(temp_dict.__getitem__, keys[:n])))
Run Code Online (Sandbox Code Playgroud)

更换list(itertools.chain.from_iterable)用一些神奇速度东西了不少:

In [28]: %timeit select_items_faster_map_getitem_list_extend(test_dict, test_n)
10000 loops, best of 3: 74.9 us per loop

In 29: print _27
def select_items_faster_map_getitem_list_extend(temp_dict, n):
    """Select n items from the dictionary"""
    keys = temp_dict.keys()
    keys.sort()
    result = []
    filter(result.extend, map(temp_dict.__getitem__, keys[:n]))
    return result
Run Code Online (Sandbox Code Playgroud)

用itertools函数替换map和slice会挤出更快的速度:

In [31]: %timeit select_items_faster_map_getitem_list_extend_iterables(test_dict, test_n)
10000 loops, best of 3: 72.8 us per loop

In [32]: print _30
def select_items_faster_map_getitem_list_extend_iterables(temp_dict, n):
    """Select n items from the dictionary"""
    keys = temp_dict.keys()
    keys.sort()
    result = []
    filter(result.extend, itertools.imap(temp_dict.__getitem__, itertools.islice(keys, n)))
    return result
Run Code Online (Sandbox Code Playgroud)

这和我认为的速度一样快,因为在CPython中Python函数调用相当慢,这最大限度地减少了内循环中Python函数调用的次数.

注意:

  • 由于OP没有提供输入数据的任何暗示,所以我不得不猜测.我可能会离开,这可能会彻底改变"快"的含义.
  • 我的每个实现都返回n - 1项,而不是n.

编辑:使用相同的方法来分析JF Sebastian的代码:

In [2]: %timeit select_items_heapq(test_dict, test_n)
1000 loops, best of 3: 572 us per loop

In [3]: print _1
from itertools import *
import heapq

def select_items_heapq(temp_dict, n):
    return list(islice(chain.from_iterable(imap(temp_dict.get, heapq.nsmallest(n, temp_dict))),n))
Run Code Online (Sandbox Code Playgroud)

TokenMacGuy的代码:

In [5]: %timeit select_items_tokenmacguy_first(test_dict, test_n)
1000 loops, best of 3: 201 us per loop

In [6]: %timeit select_items_tokenmacguy_second(test_dict, test_n)
1000 loops, best of 3: 730 us per loop

In [7]: print _4
def select_items_tokenmacguy_first(m, n):
    k, v, r = m.keys(), m.values(), range(len(m))
    r.sort(key=k.__getitem__)
    return [v[i] for i in r[:n]]

import heapq
def select_items_tokenmacguy_second(m, n):
    k, v, r = m.keys(), m.values(), range(len(m))
    smallest = heapq.nsmallest(n, r, k.__getitem__)
    for i, ind in enumerate(smallest):
        smallest[i] = v[ind]
    return smallest
Run Code Online (Sandbox Code Playgroud)

  • 因为`dict.items()`在列表被排序时返回`(key,value)`的元组,所以对`.items()`列表的排序与按键排序相同(除非键是相同的... ). (2认同)
  • OP的代码:`for temp_dict [key]中的对:res.append(pair)`因此每个值必须是可迭代的.(它相当于`res.extend(temp_dict [key])`.因此上面的大多数例子都是错误的. (2认同)