我有一个大型稀疏数据框,sdf主要包含NaN在其中。当我使用sdf.to_dict()它时,它会输出该矩阵的密集版本,其中null填充了所有值。我如何省略这些NaN条目,并且只有输出条目对字典有价值?
例如,sdf是:
2018-02-02 2018-02-03
23:58:36 NaN NaN
23:58:37 1.0 NaN
23:58:40 NaN NaN
23:58:41 NaN NaN
23:58:42 NaN NaN
23:58:43 NaN NaN
23:58:48 NaN NaN
23:58:49 NaN NaN
23:58:50 NaN NaN
23:58:52 NaN 1.0
23:58:59 NaN NaN
23:59:00 NaN NaN
23:59:01 NaN NaN
23:59:05 NaN NaN
23:59:07 NaN NaN
Run Code Online (Sandbox Code Playgroud)
stf.to_dict()会给出:
{'2018-02-02': {'23:58:36': nan, '23:58:37': 1.0, '23:58:40':
nan, '23:58:41': nan, '23:58:42': nan, '23:58:43': nan,
'23:58:48': nan, '23:58:49': nan, '23:58:50': nan, '23:58:52':
nan, '23:58:59': nan, '23:59:00': nan, '23:59:01': nan,
'23:59:05': nan, '23:59:07': nan}, '2018-02-03': {'23:58:36':
nan, '23:58:37': nan, '23:58:40': nan, '23:58:41': nan,
'23:58:42': nan, '23:58:43': nan, '23:58:48': nan, '23:58:49':
nan, '23:58:50': nan, '23:58:52': 1.0, '23:58:59': nan,
'23:59:00': nan, '23:59:01': nan, '23:59:05': nan, '23:59:07':
nan}}
Run Code Online (Sandbox Code Playgroud)
Evensdf是一个稀疏数据框。
抱歉含糊不清。我想保留所有非NaN条目。期望的输出是
{'2018-02-02': {'23:58:37': 1.0}, '2018-02-03': {'23:58:52': 1.0}}
Run Code Online (Sandbox Code Playgroud)
stack与以下一起使用dict comprehension:
from collections import defaultdict
d = defaultdict(dict)
for (k1, k2), v in df.stack().items():
d[k2][k1] = v
d1 = dict(d)
Run Code Online (Sandbox Code Playgroud)
如果输入Series是DatetimeIndex:
print (s)
2018-02-02 23:58:37 1.0
2018-02-03 23:58:52 1.0
dtype: float64
from collections import defaultdict
d = defaultdict(dict)
for k, v in df.stack().items():
d[k.strftime('%Y-%m-%d')][k.strftime('%H:%M:%S')] = v
d1 = dict(d)
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
1212 次 |
| 最近记录: |