熊猫数据框中的拆分列

GTr*_*reg 1 python pandas

我想使用逗号分隔符将jimy 中的列df分成两列 - 去掉ji值周围的括号也很好。我尝试了各种方法并不断出错。我想暂时避免使用lambda expression!还有其他想法吗?

例子

      ji           length
0     (75.0, 5.0)  3283.458479
1     (96.0, 5.0)  1431.312901
2     (97.0, 5.0)  1364.592959
3    (247.0, 5.0)  3736.322308
4     (81.0, 7.0)  2655.910005
5     (93.0, 7.0)  1752.293687
6    (242.0, 7.0)   427.844417
7    (248.0, 7.0)  3725.823013
8    (254.0, 7.0)  2318.937332
9    (255.0, 7.0)  2292.673905
10   (242.0, 8.0)   145.811907
11   (254.0, 8.0)  2222.447786
12   (255.0, 8.0)  2196.184360
13   (248.0, 9.0)   441.222866
14   (253.0, 9.0)   853.095032
15   (256.0, 9.0)  2076.942682
16   (91.0, 10.0)  1743.310744
17   (93.0, 10.0)  1256.337420
18  (105.0, 10.0)   523.447658
19  (174.0, 10.0)  1530.617012
20  (176.0, 10.0)  1697.614009
21  (248.0, 10.0)   440.000463
22  (253.0, 10.0)   904.706003
23  (256.0, 10.0)  1991.662604
24  (258.0, 10.0)  1850.995862
25  (172.0, 11.0)  1301.179960
26  (174.0, 11.0)  1436.984094
27  (176.0, 11.0)  1695.954099
28  (179.0, 11.0)  1548.015013
29  (228.0, 11.0)  4640.928585
30  (242.0, 11.0)   169.617203
31  (251.0, 11.0)   784.921333
32  (253.0, 11.0)   983.118859
33  (255.0, 11.0)  1181.474433
34  (256.0, 11.0)  1303.398235
Run Code Online (Sandbox Code Playgroud)

您可以使用以下方法加载上面的示例:

      ji           length
0     (75.0, 5.0)  3283.458479
1     (96.0, 5.0)  1431.312901
2     (97.0, 5.0)  1364.592959
3    (247.0, 5.0)  3736.322308
4     (81.0, 7.0)  2655.910005
5     (93.0, 7.0)  1752.293687
6    (242.0, 7.0)   427.844417
7    (248.0, 7.0)  3725.823013
8    (254.0, 7.0)  2318.937332
9    (255.0, 7.0)  2292.673905
10   (242.0, 8.0)   145.811907
11   (254.0, 8.0)  2222.447786
12   (255.0, 8.0)  2196.184360
13   (248.0, 9.0)   441.222866
14   (253.0, 9.0)   853.095032
15   (256.0, 9.0)  2076.942682
16   (91.0, 10.0)  1743.310744
17   (93.0, 10.0)  1256.337420
18  (105.0, 10.0)   523.447658
19  (174.0, 10.0)  1530.617012
20  (176.0, 10.0)  1697.614009
21  (248.0, 10.0)   440.000463
22  (253.0, 10.0)   904.706003
23  (256.0, 10.0)  1991.662604
24  (258.0, 10.0)  1850.995862
25  (172.0, 11.0)  1301.179960
26  (174.0, 11.0)  1436.984094
27  (176.0, 11.0)  1695.954099
28  (179.0, 11.0)  1548.015013
29  (228.0, 11.0)  4640.928585
30  (242.0, 11.0)   169.617203
31  (251.0, 11.0)   784.921333
32  (253.0, 11.0)   983.118859
33  (255.0, 11.0)  1181.474433
34  (256.0, 11.0)  1303.398235
Run Code Online (Sandbox Code Playgroud)

jez*_*ael 5

解决方案如果列中的字符串ji-pop用于提取的列,stripsplit使用expand=Truefor DataFrame

print (type(df.loc[0, 'ji']))
<class 'str'>

df[['a','b']] = df.pop('ji').str.strip('()').str.split(', ', expand=True).astype(float)
Run Code Online (Sandbox Code Playgroud)

或者list comprehension在没有缺失值且性能很重要的情况下使用:

L = [x.strip('()').split(', ') for x in df.pop('ji')]
df[['a','b']] = pd.DataFrame(L, index=df.index).astype(float)

print (df)
         length      a     b
0   3283.458479   75.0   5.0
1   1431.312901   96.0   5.0
2   1364.592959   97.0   5.0
3   3736.322308  247.0   5.0
4   2655.910005   81.0   7.0
5   1752.293687   93.0   7.0
6    427.844417  242.0   7.0
7   3725.823013  248.0   7.0
Run Code Online (Sandbox Code Playgroud)

如果元组则创建嵌套的元组列表并传递给DataFrame构造函数:

print (type(df.loc[0, 'ji']))
<class 'tuple'>

df[['a','b']] = pd.DataFrame(df.pop('ji').values.tolist(), index=df.index)
Run Code Online (Sandbox Code Playgroud)