Bar*_*cks 2 python dataframe pandas
我有一个pandas数据帧,其中一列包含不同长度的列表.在pandas中爆炸列表的解决方案都假设要爆炸的列表都具有相同的长度.
这是我的df:
Dep Exp Fl-No Shared Codes
0 20:58 20:55 LX 736 [No shared codes]
1 21:23 20:55 LX 818 [Dummy, LH 5809]
2 21:27 21:00 JU 375 [No shared codes]
4 21:28 21:00 LX 770 [Dummy, SN 5102]
7 21:31 21:10 LX 1842 [Dummy, LH 5880, TP 8184, A3 1985]
Run Code Online (Sandbox Code Playgroud)
这就是我要找的东西:
Dep Exp Fl-No Shared Codes
0 20:58 20:55 LX 736 No shared codes
1 21:23 20:55 LX 818 Dummy
1 21:23 20:55 LX 818 LH 5809
2 21:27 21:00 JU 375 No shared codes
4 21:28 21:00 LX 770 Dummy
4 21:28 21:00 LX 770 SN 5102
7 21:31 21:10 LX 1842 Dummy
7 21:31 21:10 LX 1842 LH 5880
7 21:31 21:10 LX 1842 TP 8184
7 21:31 21:10 LX 1842 A3 1985
Run Code Online (Sandbox Code Playgroud)
有人有什么建议吗?
与@coldspeed非常相似.我采取了几个不同的步骤.
s = df['Shared Codes']
i = np.arange(len(df)).repeat(s.str.len())
df.iloc[i, :-1].assign(**{'Shared Codes': np.concatenate(s.values)})
Dep Exp Fl-No Shared Codes
0 20:58 20:55 LX 736 No shared codes
1 21:23 20:55 LX 818 Dummy
1 21:23 20:55 LX 818 LH 5809
2 21:27 21:00 JU 375 No shared codes
4 21:28 21:00 LX 770 Dummy
4 21:28 21:00 LX 770 SN 5102
7 21:31 21:10 LX 1842 Dummy
7 21:31 21:10 LX 1842 LH 5880
7 21:31 21:10 LX 1842 TP 8184
7 21:31 21:10 LX 1842 A3 1985
Run Code Online (Sandbox Code Playgroud)
熊猫 >=0.25
df:
Name Data
0 Bar [Product, Item, X]
1 Foo [Product, Misc]
Run Code Online (Sandbox Code Playgroud)
使用爆炸:
df = df.explode('Data')
Run Code Online (Sandbox Code Playgroud)
df:
Name Data
0 Bar Product
0 Bar Item
0 Bar X
1 Foo Product
1 Foo Misc
Run Code Online (Sandbox Code Playgroud)
一种可能性是使用np.repeat
and np.hstack
:
print(df)
Dep Exp Fl-No Shared Codes
0 20:58 20:55 LX 736 [No shared codes]
1 21:23 20:55 LX 818 [Dummy, LH 5809]
2 21:27 21:00 JU 375 [No shared codes]
4 21:28 21:00 LX 770 [Dummy, SN 5102]
7 21:31 21:10 LX 1842 [Dummy, LH 5880, TP 8184, A3 1985]
x = df.iloc[:, :-1].values.repeat(df['Shared Codes'].apply(len), 0)
y = df['Shared Codes'].apply(pd.Series).stack().values.reshape(-1, 1)
out = pd.DataFrame(np.hstack((x, y)), columns=df.columns)
print(out)
Dep Exp Fl-No Shared Codes
0 20:58 20:55 LX 736 No shared codes
1 21:23 20:55 LX 818 Dummy
2 21:23 20:55 LX 818 LH 5809
3 21:27 21:00 JU 375 No shared codes
4 21:28 21:00 LX 770 Dummy
5 21:28 21:00 LX 770 SN 5102
6 21:31 21:10 LX 1842 Dummy
7 21:31 21:10 LX 1842 LH 5880
8 21:31 21:10 LX 1842 TP 8184
9 21:31 21:10 LX 1842 A3 1985
Run Code Online (Sandbox Code Playgroud)