C8H*_*4O2 9 python numpy dataframe pandas
我正在努力构建一个基于生成的元组的值计数的DataFrame的基本任务np.unique(arr, return_counts=True)
,例如:
import numpy as np
import pandas as pd
np.random.seed(123)
birds=np.random.choice(['African Swallow','Dead Parrot','Exploding Penguin'], size=int(5e4))
someTuple=np.unique(birds, return_counts = True)
someTuple
#(array(['African Swallow', 'Dead Parrot', 'Exploding Penguin'],
# dtype='<U17'), array([16510, 16570, 16920], dtype=int64))
Run Code Online (Sandbox Code Playgroud)
首先我试过了
pd.DataFrame(list(someTuple))
# Returns this:
# 0 1 2
# 0 African Swallow Dead Parrot Exploding Penguin
# 1 16510 16570 16920
Run Code Online (Sandbox Code Playgroud)
我也试过pd.DataFrame.from_records(someTuple)
,它返回同样的东西.
但我正在寻找的是:
# birdType birdCount
# 0 African Swallow 16510
# 1 Dead Parrot 16570
# 2 Exploding Penguin 16920
Run Code Online (Sandbox Code Playgroud)
什么是正确的语法?
使用元组,您可以执行以下操作:
In [4]: pd.DataFrame(list(zip(*someTuple)), columns = ['Bird', 'BirdCount'])
Out[4]:
Bird BirdCount
0 African Swallow 16510
1 Dead Parrot 16570
2 Exploding Penguin 16920
Run Code Online (Sandbox Code Playgroud)
这是一个基于NumPy的解决方案,具有np.column_stack
-
pd.DataFrame(np.column_stack(someTuple),columns=['birdType','birdCount'])
Run Code Online (Sandbox Code Playgroud)
pd.DataFrame(np.vstack(someTuple).T,columns=['birdType','birdCount'])
Run Code Online (Sandbox Code Playgroud)
基准测试np.transpose
,np.column_stack
以及np.vstack
将1D
数组放样 到列中以形成2D
数组-
In [54]: tup1 = (np.random.rand(1000),np.random.rand(1000))
In [55]: %timeit np.transpose(tup1)
100000 loops, best of 3: 15.9 µs per loop
In [56]: %timeit np.column_stack(tup1)
100000 loops, best of 3: 11 µs per loop
In [57]: %timeit np.vstack(tup1).T
100000 loops, best of 3: 14.1 µs per loop
Run Code Online (Sandbox Code Playgroud)
创建字典
pd.DataFrame(dict(birdType=someTuple[0], birdCount=someTuple[1]))
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
1534 次 |
最近记录: |