Jan*_*nda 6 python python-3.x pandas
我有一个列表的字典(具有可变长度),我期待从中创建一个Dataframe的有效方法.
假设我有最小列表长度,所以我可以在创建Dataframe时截断更大列表的大小.
这是我的虚拟代码
data_dict = {'a': [1,2,3,4], 'b': [1,2,3], 'c': [2,45,67,93,82,92]}
min_length = 3
Run Code Online (Sandbox Code Playgroud)
我可以拥有10k或20k密钥的字典,因此寻找一种有效的方法来创建像下面这样的DataFrame
>>> df
a b c
0 1 1 2
1 2 2 45
2 3 3 67
Run Code Online (Sandbox Code Playgroud)
单行解决方案:
#Construct the df horizontally and then transpose. Finally drop rows with nan.
pd.DataFrame.from_dict(data_dict,orient='index').T.dropna()
Out[326]:
a b c
0 1.0 1.0 2.0
1 2.0 2.0 45.0
2 3.0 3.0 67.0
Run Code Online (Sandbox Code Playgroud)
您可以过滤values
in ,dict
然后dict comprehension
DataFrame
完美运行:
print ({k:v[:min_length] for k,v in data_dict.items()})\n{\'b\': [1, 2, 3], \'c\': [2, 45, 67], \'a\': [1, 2, 3]}\n\n\ndf = pd.DataFrame({k:v[:min_length] for k,v in data_dict.items()})\nprint (df)\n a b c\n0 1 1 2\n1 2 2 45\n2 3 3 67\n
Run Code Online (Sandbox Code Playgroud)\n\n如果可能的话,一些长度可以减少为min_length
add Series
:
data_dict = {\'a\': [1,2,3,4], \'b\': [1,2], \'c\': [2,45,67,93,82,92]}\nmin_length = 3\n\ndf = pd.DataFrame({k:pd.Series(v[:min_length]) for k,v in data_dict.items()})\nprint (df)\n a b c\n0 1 1.0 2\n1 2 2.0 45\n2 3 NaN 67\n
Run Code Online (Sandbox Code Playgroud)\n\n时间安排:
\n\nIn [355]: %timeit (pd.DataFrame({k:v[:min_length] for k,v in data_dict.items()}))\nThe slowest run took 5.32 times longer than the fastest. This could mean that an intermediate result is being cached.\n1000 loops, best of 3: 520 \xc2\xb5s per loop\n\nIn [356]: %timeit (pd.DataFrame({k:pd.Series(v[:min_length]) for k,v in data_dict.items()}))\nThe slowest run took 4.50 times longer than the fastest. This could mean that an intermediate result is being cached.\n1000 loops, best of 3: 937 \xc2\xb5s per loop\n\n#Allen\'s solution\nIn [357]: %timeit (pd.DataFrame.from_dict(data_dict,orient=\'index\').T.dropna())\n1 loop, best of 3: 16.7 s per loop\n
Run Code Online (Sandbox Code Playgroud)\n\n计时代码:
\n\nnp.random.seed(123)\nL = list(\'ABCDEFGH\')\nN = 500000\nmin_length = 10000\n\ndata_dict = {k:np.random.randint(10, size=np.random.randint(N)) for k in L}\n
Run Code Online (Sandbox Code Playgroud)\n
归档时间: |
|
查看次数: |
2220 次 |
最近记录: |