ora*_*nge 6 python multi-index pandas
我需要重新索引 pandas 数据帧的第二级,以便第二级成为0,...,(N-1)每个第一级索引的(完整)列表。
例子:
df = pd.DataFrame({
'first': ['one', 'one', 'one', 'two', 'two', 'three'],
'second': [0, 1, 2, 0, 1, 1],
'value': [1, 2, 3, 4, 5, 6]
})
print df
first second value
0 one 0 1
1 one 1 2
2 one 2 3
3 two 0 4
4 two 1 5
5 three 1 6
# Tried using Allan/Hayden's approach, but no good for this, doesn't add the missing rows
df['second'] = df.reset_index().groupby(['first']).cumcount()
print df
first second value
0 one 0 1
1 one 1 2
2 one 2 3
3 two 0 4
4 two 1 5
5 three 0 6
Run Code Online (Sandbox Code Playgroud)
我想要的结果是:
first second value
0 one 0 1
1 one 1 2
2 one 2 3
3 two 0 4
4 two 1 5
4 two 2 nan <-- INSERTED
5 three 0 6
5 three 1 nan <-- INSERTED
5 three 2 nan <-- INSERTED
Run Code Online (Sandbox Code Playgroud)
我认为你可以先设置列first和second作为多级索引,然后reindex.
# your data
# ==========================
df = pd.DataFrame({
'first': ['one', 'one', 'one', 'two', 'two', 'three'],
'second': [0, 1, 2, 0, 1, 1],
'value': [1, 2, 3, 4, 5, 6]
})
df
first second value
0 one 0 1
1 one 1 2
2 one 2 3
3 two 0 4
4 two 1 5
5 three 1 6
# processing
# ============================
multi_index = pd.MultiIndex.from_product([df['first'].unique(), np.arange(3)], names=['first', 'second'])
df.set_index(['first', 'second']).reindex(multi_index).reset_index()
first second value
0 one 0 1
1 one 1 2
2 one 2 3
3 two 0 4
4 two 1 5
5 two 2 NaN
6 three 0 NaN
7 three 1 6
8 three 2 NaN
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
3522 次 |
| 最近记录: |