计算嵌套列表中所有元素的计数

Question

计算嵌套列表中所有元素的计数

Tho*_*son 6 python dictionary list python-3.x pandas

我有列表列表,并希望创建包含所有唯一元素计数的数据框.这是我的测试数据:

test = [["P1", "P1", "P1", "P2", "P2", "P1", "P1", "P3"],
        ["P1", "P1", "P1"],
        ["P1", "P1", "P1", "P2"],
        ["P4"],
        ["P1", "P4", "P2"],
        ["P1", "P1", "P1"]]

Run Code Online (Sandbox Code Playgroud)

我可以用做这样的事情Counter与for循环为:

from collections import Counter
for item in test:
     print(Counter(item))

Run Code Online (Sandbox Code Playgroud)

但是,如何将此循环的结果汇总到新的数据框中？

预期输出为数据框:

P1 P2 P3 P4
15 4  1  2

Run Code Online (Sandbox Code Playgroud)

Answer 1

jpp*_*jpp 6

这是一种方式.

from collections import Counter
from itertools import chain

test = [["P1", "P1", "P1", "P2", "P2", "P1", "P1", "P3"],
        ["P1", "P1", "P1"],
        ["P1", "P1", "P1", "P2"],
        ["P4"],
        ["P1", "P4", "P2"],
        ["P1", "P1", "P1"]]

c = Counter(chain.from_iterable(test))

for k, v in c.items():
    print(k, v)

# P1 15
# P2 4
# P3 1
# P4 2

Run Code Online (Sandbox Code Playgroud)

对于输出为数据帧:

df = pd.DataFrame.from_dict(c, orient='index').transpose()

#    P1 P2 P3 P4
# 0  15  4  1  2

Run Code Online (Sandbox Code Playgroud)

已经有像你一样处理导入的功能.它是`来自itertools import chain.from_iterable as concat` (3认同)
@ Ev.Kounis实际上并不完全,`来自itertools导入链作为concat`虽然可以,但我同意他们目前所拥有的一个班轮是讨厌的,但其他方面的答案(我做了一个编辑,希望它没关系) (2认同)

Answer 2

Moi*_*dri 5

在更好的性能方面,您应该使用:

collections.Counter与itertools.chain.from_iterable:

>>> from collections import Counter
>>> from itertools import chain

>>> Counter(chain.from_iterable(test))
Counter({'P1': 15, 'P2': 4, 'P4': 2, 'P3': 1})

Run Code Online (Sandbox Code Playgroud)

OR,哟应该使用collections.Counter与列表理解 (需要一个进口少itertools用相同的性能)为:
```
>>> from collections import Counter

>>> Counter([x for a in test for x in a])
Counter({'P1': 15, 'P2': 4, 'P4': 2, 'P3': 1})
```
Run Code Online (Sandbox Code Playgroud)

继续阅读更多替代解决方案和性能比较.(否则跳过)

方法1:连接您的子列表以创建单个list并使用查找计数collections.Counter.

解决方案1:使用连接列表itertools.chain.from_iterable并使用collections.Counteras 查找计数:

test = [
    ["P1", "P1", "P1", "P2", "P2", "P1", "P1", "P3"],
    ["P1", "P1", "P1"],
    ["P1", "P1", "P1", "P2"],
    ["P4"],
    ["P1", "P4", "P2"],
    ["P1", "P1", "P1"]
]

from itertools import chain 
from collections import Counter

my_counter = Counter(chain.from_iterable(test))

Run Code Online (Sandbox Code Playgroud)

解决方案2:使用列表解析将列表组合为:
```
from collections import Counter

my_counter = Counter([x for a in my_list for x in a])
```
Run Code Online (Sandbox Code Playgroud)
解决方案3:使用连接列表sum
```
from collections import Counter

my_counter = Counter(sum(test, []))
```
Run Code Online (Sandbox Code Playgroud)

方法2: 使用列表中的对象计算每个子列表中元素的数量,collections.Counter然后计算列表中sum的Counter对象.

解决方案4:使用collections.Counter和计算每个子列表的对象map:
```
from collections import Counter

my_counter = sum(map(Counter, test), Counter())
```
Run Code Online (Sandbox Code Playgroud)
解决方案5:使用列表解析计算每个子列表的对象:
```
from collections import Counter

my_counter = sum([Counter(t) for t in test], Counter())
```
Run Code Online (Sandbox Code Playgroud)

在上面的所有解决方案中,my_counter将保持价值:

>>> my_counter
Counter({'P1': 15, 'P2': 4, 'P4': 2, 'P3': 1})

Run Code Online (Sandbox Code Playgroud)

绩效比较

下面是timeitPython 3中1000个子列表的列表和每个子列表中的100个元素的比较:

使用最快chain.from_iterable (17.1毫秒)

mquadri$ python3 -m timeit "from collections import Counter; from itertools import chain; my_list = [list(range(100)) for i in range(1000)]" "Counter(chain.from_iterable(my_list))"
100 loops, best of 3: 17.1 msec per loop

Run Code Online (Sandbox Code Playgroud)

列表中的第二个是使用列表推导来组合列表然后执行Count(与上面类似的结果但没有额外导入itertools)(18.36毫秒)

mquadri$ python3 -m timeit "from collections import Counter; my_list = [list(range(100)) for i in range(1000)]" "Counter([x for a in my_list for x in a])"
100 loops, best of 3: 18.36 msec per loop

Run Code Online (Sandbox Code Playgroud)

性能方面的第三个是Counter在列表理解中使用子列表:(162毫秒)

mquadri$ python3 -m timeit "from collections import Counter; my_list = [list(range(100)) for i in range(1000)]" "sum([Counter(t) for t in my_list], Counter())"
10 loops, best of 3: 162 msec per loop

Run Code Online (Sandbox Code Playgroud)

列表中的第四个是通过使用Counterwith map(结果与上面使用列表理解的结果非常相似)(176毫秒)

mquadri$ python3 -m timeit "from collections import Counter; my_list = [list(range(100)) for i in range(1000)]" "sum(map(Counter, my_list), Counter())"
10 loops, best of 3: 176 msec per loop

Run Code Online (Sandbox Code Playgroud)

sum用于连接列表的解决方案太慢(526毫秒)

mquadri$ python3 -m timeit "from collections import Counter; my_list = [list(range(100)) for i in range(1000)]" "Counter(sum(my_list, []))"
10 loops, best of 3: 526 msec per loop

Run Code Online (Sandbox Code Playgroud)

归档时间：	8 年，4 月前
查看次数：	645 次
最近记录：	8 年，1 月前