pan*_*kaj 2 python regex dataframe python-3.x pandas
我有如下所示的数据框。我想将列 zip 内的值拆分为行值,如下所示。这些值可以_ ,.
由这些分隔符分隔。这如何在 python 中完成。
输入
df.head(5)
Date Item_Code Type Zip
1/1/2020 A Long 07_08_09
12/4/2020 B Small AB_CD_EF_GF
13/4/2020 A Long 08_14
1/5/2020 A Long
21/5/2020 B Small 09,07,16
22/5/2020 B Small AB,07
Run Code Online (Sandbox Code Playgroud)
预期产出
Date Item_Code Type Zip
1/1/2020 A Long 07
1/1/2020 A Long 08
1/1/2020 A Long 09
12/4/2020 B Small AB
12/4/2020 B Small CD
12/4/2020 B Small EF
12/4/2020 B Small GF
13/4/2020 A Long 08
13/4/2020 A Long 14
1/5/2020 A Long
21/5/2020 B Small 09
21/5/2020 B Small 07
21/5/2020 B Small 16
22/5/2020 B Small AB
22/5/2020 B Small 07
Run Code Online (Sandbox Code Playgroud)
首先使用Series.str.split
witg正则表达式,分配回来DataFrame.assign
然后使用DataFrame.explode
,在最后一步中还创建了默认的唯一索引:
df1 = (df.assign(Zip = df['Zip'].str.split('[_ ,\.]'))
.explode('Zip')
.reset_index(drop=True))
print (df1)
Date Item_Code Type Zip
0 1/1/2020 A Long 07
1 1/1/2020 A Long 08
2 1/1/2020 A Long 09
3 12/4/2020 B Small AB
4 12/4/2020 B Small CD
5 12/4/2020 B Small EF
6 12/4/2020 B Small GF
7 13/4/2020 A Long 08
8 13/4/2020 A Long 14
9 1/5/2020 A Long NaN
10 21/5/2020 B Small 09
11 21/5/2020 B Small 07
12 21/5/2020 B Small 16
13 22/5/2020 B Small AB
14 22/5/2020 B Small 07
Run Code Online (Sandbox Code Playgroud)