Sim*_*n B 5 python missing-data pandas
我给出了以下 df
df = pd.DataFrame(data = {'day': [1, 1, 1, 2, 2, 3], 'pos': 2*[1, 14, 18], 'value': 2*[1, 2, 3]}
df
Run Code Online (Sandbox Code Playgroud)
day pos value
0 1 1 1
1 1 14 2
2 1 18 3
3 2 1 1
4 2 14 2
5 3 18 3
Run Code Online (Sandbox Code Playgroud)
我想填写行,以便每天都有列“pos”的所有可能值
想要的结果:
day pos value
0 1 1 1.0
1 1 14 2.0
2 1 18 3.0
3 2 1 1.0
4 2 14 2.0
5 2 18 NaN
6 3 1 NaN
7 3 14 NaN
8 3 18 3.0
Run Code Online (Sandbox Code Playgroud)
主张:
day pos value
0 1 1 1
1 1 14 2
2 1 18 3
3 2 1 1
4 2 14 2
5 3 18 3
Run Code Online (Sandbox Code Playgroud)
产量:
ValueError: cannot reindex from a duplicate axis
Run Code Online (Sandbox Code Playgroud)
pivot那么我们来尝试一下stack:
df.pivot('day','pos','value').stack(dropna=False).reset_index(name='value')
Run Code Online (Sandbox Code Playgroud)
输出:
day pos value
0 1 1 1.0
1 1 14 2.0
2 1 18 3.0
3 2 1 1.0
4 2 14 2.0
5 2 18 NaN
6 3 1 NaN
7 3 14 NaN
8 3 18 3.0
Run Code Online (Sandbox Code Playgroud)
选项 2:与 MultiIndex 合并:
df.merge(pd.DataFrame(index=pd.MultiIndex.from_product([df['day'].unique(), df['pos'].unique()])),
left_on=['day','pos'], right_index=True, how='outer')
Run Code Online (Sandbox Code Playgroud)
输出:
day pos value
0 1 1 1.0
1 1 14 2.0
2 1 18 3.0
3 2 1 1.0
4 2 14 2.0
5 3 18 3.0
5 2 18 NaN
5 3 1 NaN
5 3 14 NaN
Run Code Online (Sandbox Code Playgroud)