重新索引不完整的多级数据帧中的第二级以使其完整，在缺失的行上插入 NAN

Question

重新索引不完整的多级数据帧中的第二级以使其完整，在缺失的行上插入 NAN

我需要重新索引 pandas 数据帧的第二级，以便第二级成为0,...,(N-1)每个第一级索引的（完整）列表。

我尝试使用艾伦/海登的方法，但不幸的是它只创建一个索引，其行数与以前存在的行数一样多。
我想要的是，对于每个新索引，都会插入新行（带有 nan 值）。

例子：

df = pd.DataFrame({
  'first': ['one', 'one', 'one', 'two', 'two', 'three'], 
  'second': [0, 1, 2, 0, 1, 1],
  'value': [1, 2, 3, 4, 5, 6]
})
print df

   first  second  value
0    one       0      1
1    one       1      2
2    one       2      3
3    two       0      4
4    two       1      5
5  three       1      6

# Tried using Allan/Hayden's approach, but no good for this, doesn't add the missing rows    
df['second'] = df.reset_index().groupby(['first']).cumcount()
print df
   first  second  value
0    one       0      1
1    one       1      2
2    one       2      3
3    two       0      4
4    two       1      5
5  three       0      6

Run Code Online (Sandbox Code Playgroud)

我想要的结果是：

   first  second  value
0    one       0      1
1    one       1      2
2    one       2      3
3    two       0      4
4    two       1      5
4    two       2      nan <-- INSERTED
5  three       0      6
5  three       1      nan <-- INSERTED
5  three       2      nan <-- INSERTED

Run Code Online (Sandbox Code Playgroud)

Answer 1

Jia*_* Li 5

我认为你可以先设置列first和second作为多级索引，然后reindex.

# your data
# ==========================
df = pd.DataFrame({
  'first': ['one', 'one', 'one', 'two', 'two', 'three'], 
  'second': [0, 1, 2, 0, 1, 1],
  'value': [1, 2, 3, 4, 5, 6]
})

df

   first  second  value
0    one       0      1
1    one       1      2
2    one       2      3
3    two       0      4
4    two       1      5
5  three       1      6

# processing
# ============================
multi_index = pd.MultiIndex.from_product([df['first'].unique(), np.arange(3)], names=['first', 'second'])

df.set_index(['first', 'second']).reindex(multi_index).reset_index()

   first  second  value
0    one       0      1
1    one       1      2
2    one       2      3
3    two       0      4
4    two       1      5
5    two       2    NaN
6  three       0    NaN
7  three       1      6
8  three       2    NaN

Run Code Online (Sandbox Code Playgroud)

归档时间：	10 年，5 月前
查看次数：	3522 次
最近记录：	3 年，6 月前