Tho*_*son 6 python dictionary list python-3.x pandas
我有列表列表,并希望创建包含所有唯一元素计数的数据框.这是我的测试数据:
test = [["P1", "P1", "P1", "P2", "P2", "P1", "P1", "P3"],
["P1", "P1", "P1"],
["P1", "P1", "P1", "P2"],
["P4"],
["P1", "P4", "P2"],
["P1", "P1", "P1"]]
Run Code Online (Sandbox Code Playgroud)
我可以用做这样的事情Counter与for循环为:
from collections import Counter
for item in test:
print(Counter(item))
Run Code Online (Sandbox Code Playgroud)
但是,如何将此循环的结果汇总到新的数据框中?
预期输出为数据框:
P1 P2 P3 P4
15 4 1 2
Run Code Online (Sandbox Code Playgroud)
这是一种方式.
from collections import Counter
from itertools import chain
test = [["P1", "P1", "P1", "P2", "P2", "P1", "P1", "P3"],
["P1", "P1", "P1"],
["P1", "P1", "P1", "P2"],
["P4"],
["P1", "P4", "P2"],
["P1", "P1", "P1"]]
c = Counter(chain.from_iterable(test))
for k, v in c.items():
print(k, v)
# P1 15
# P2 4
# P3 1
# P4 2
Run Code Online (Sandbox Code Playgroud)
对于输出为数据帧:
df = pd.DataFrame.from_dict(c, orient='index').transpose()
# P1 P2 P3 P4
# 0 15 4 1 2
Run Code Online (Sandbox Code Playgroud)
在更好的性能方面,您应该使用:
collections.Counter与itertools.chain.from_iterable:
>>> from collections import Counter
>>> from itertools import chain
>>> Counter(chain.from_iterable(test))
Counter({'P1': 15, 'P2': 4, 'P4': 2, 'P3': 1})
Run Code Online (Sandbox Code Playgroud)OR,哟应该使用collections.Counter与列表理解 (需要一个进口少itertools用相同的性能)为:
>>> from collections import Counter
>>> Counter([x for a in test for x in a])
Counter({'P1': 15, 'P2': 4, 'P4': 2, 'P3': 1})
Run Code Online (Sandbox Code Playgroud)继续阅读更多替代解决方案和性能比较.(否则跳过)
方法1:连接您的子列表以创建单个list并使用查找计数collections.Counter.
解决方案1:使用连接列表itertools.chain.from_iterable并使用collections.Counteras 查找计数:
test = [
["P1", "P1", "P1", "P2", "P2", "P1", "P1", "P3"],
["P1", "P1", "P1"],
["P1", "P1", "P1", "P2"],
["P4"],
["P1", "P4", "P2"],
["P1", "P1", "P1"]
]
from itertools import chain
from collections import Counter
my_counter = Counter(chain.from_iterable(test))
Run Code Online (Sandbox Code Playgroud)解决方案2:使用列表解析将列表组合为:
from collections import Counter
my_counter = Counter([x for a in my_list for x in a])
Run Code Online (Sandbox Code Playgroud)解决方案3:使用连接列表sum
from collections import Counter
my_counter = Counter(sum(test, []))
Run Code Online (Sandbox Code Playgroud)方法2: 使用列表中的对象计算每个子列表中元素的数量,collections.Counter然后计算列表中sum的Counter对象.
解决方案4:使用collections.Counter和计算每个子列表的对象map:
from collections import Counter
my_counter = sum(map(Counter, test), Counter())
Run Code Online (Sandbox Code Playgroud)解决方案5:使用列表解析计算每个子列表的对象:
from collections import Counter
my_counter = sum([Counter(t) for t in test], Counter())
Run Code Online (Sandbox Code Playgroud)在上面的所有解决方案中,my_counter将保持价值:
>>> my_counter
Counter({'P1': 15, 'P2': 4, 'P4': 2, 'P3': 1})
Run Code Online (Sandbox Code Playgroud)
下面是timeitPython 3中1000个子列表的列表和每个子列表中的100个元素的比较:
使用最快chain.from_iterable (17.1毫秒)
mquadri$ python3 -m timeit "from collections import Counter; from itertools import chain; my_list = [list(range(100)) for i in range(1000)]" "Counter(chain.from_iterable(my_list))"
100 loops, best of 3: 17.1 msec per loop
Run Code Online (Sandbox Code Playgroud)列表中的第二个是使用列表推导来组合列表然后执行Count(与上面类似的结果但没有额外导入itertools)(18.36毫秒)
mquadri$ python3 -m timeit "from collections import Counter; my_list = [list(range(100)) for i in range(1000)]" "Counter([x for a in my_list for x in a])"
100 loops, best of 3: 18.36 msec per loop
Run Code Online (Sandbox Code Playgroud)性能方面的第三个是Counter在列表理解中使用子列表:(162毫秒)
mquadri$ python3 -m timeit "from collections import Counter; my_list = [list(range(100)) for i in range(1000)]" "sum([Counter(t) for t in my_list], Counter())"
10 loops, best of 3: 162 msec per loop
Run Code Online (Sandbox Code Playgroud)列表中的第四个是通过使用Counterwith map(结果与上面使用列表理解的结果非常相似)(176毫秒)
mquadri$ python3 -m timeit "from collections import Counter; my_list = [list(range(100)) for i in range(1000)]" "sum(map(Counter, my_list), Counter())"
10 loops, best of 3: 176 msec per loop
Run Code Online (Sandbox Code Playgroud)sum用于连接列表的解决方案太慢(526毫秒)
mquadri$ python3 -m timeit "from collections import Counter; my_list = [list(range(100)) for i in range(1000)]" "Counter(sum(my_list, []))"
10 loops, best of 3: 526 msec per loop
Run Code Online (Sandbox Code Playgroud)| 归档时间: |
|
| 查看次数: |
645 次 |
| 最近记录: |