当键值在iterable的元素中时,如何使用itertools.groupby?

Kit*_*Kit 7 python group-by python-itertools

为了说明,我从一个2元组列表开始:

import itertools
import operator

raw = [(1, "one"),
       (2, "two"),
       (1, "one"),
       (3, "three"),
       (2, "two")]

for key, grp in itertools.groupby(raw, key=lambda item: item[0]):
    print key, list(grp).pop()[1]
Run Code Online (Sandbox Code Playgroud)

收益率:

1 one
2 two
1 one
3 three
2 two
Run Code Online (Sandbox Code Playgroud)

试图调查原因:

for key, grp in itertools.groupby(raw, key=lambda item: item[0]):
    print key, list(grp)

# ---- OUTPUT ----
1 [(1, 'one')]
2 [(2, 'two')]
1 [(1, 'one')]
3 [(3, 'three')]
2 [(2, 'two')]
Run Code Online (Sandbox Code Playgroud)

即使这样也会给我相同的输出:

for key, grp in itertools.groupby(raw, key=operator.itemgetter(0)):
    print key, list(grp)
Run Code Online (Sandbox Code Playgroud)

我希望得到类似的东西:

1 one, one
2 two, two
3 three
Run Code Online (Sandbox Code Playgroud)

我认为这是因为键是在列表中的元组内部,而实际上元组是作为一个元素移动的.有没有办法达到我想要的输出?也许groupby()不适合这项任务?

unu*_*tbu 11

groupby聚类具有相同密钥的iterable的连续元素.要产生您想要的输出,您必须先排序raw.

for key, grp in itertools.groupby(sorted(raw), key=operator.itemgetter(0)):
    print key, map(operator.itemgetter(1), grp)

# 1 ['one', 'one']
# 2 ['two', 'two']
# 3 ['three']
Run Code Online (Sandbox Code Playgroud)


Joh*_*ooy 6

我认为,获得理想结果的更简洁方法就是这样.

>>> from collections import defaultdict
>>> d=defaultdict(list)
>>> for k,v in raw:
...  d[k].append(v)
... 
>>> for k,v in sorted(d.items()):
...  print k, v
... 
1 ['one', 'one']
2 ['two', 'two']
3 ['three']
Run Code Online (Sandbox Code Playgroud)

building d是O(n),现在sorted()只是在唯一键上而不是整个数据集