使用正则表达式根据列的值在数据集中创建新列

Mar*_*ary 5 python regex dataframe pandas

这是我的数据框

index     duration 
1           7 year   
2           2day
3           4 week
4           8 month
Run Code Online (Sandbox Code Playgroud)

我需要将数字与时间分开并将它们放在两个新列中。输出是这样的:

index     duration         number     time
1           7 year          7         year
2           2day            2         day
3           4 week          4        week
4           8 month         8         month
Run Code Online (Sandbox Code Playgroud)

这是我的代码:

df ['numer'] = df.duration.replace(r'\d.*' , r'\d', regex=True, inplace = True)
df [ 'time']= df.duration.replace (r'\.w.+',r'\w.+', regex=True, inplace = True )
Run Code Online (Sandbox Code Playgroud)

但它不起作用。有什么建议吗?

我还需要根据时间列的值创建另一列。所以新的数据集是这样的:

 index     duration         number     time      time_days
    1           7 year          7         year       365
    2           2day            2         day         1
    3           4 week          4        week         7
    4           8 month         8         month       30

df['time_day']= df.time.replace(r'(year|month|week|day)', r'(365|30|7|1)', regex=True, inplace=True)
Run Code Online (Sandbox Code Playgroud)

有什么建议吗?

Max*_*axU 5

我们可以在这里使用Series.str.extract

In [67]: df[['number','time']] = df.duration.str.extract(r'(\d+)\s*(.*)', expand=True)

In [68]: df
Out[68]:
   index duration number    time
0      1   7 year      7    year
1      2     2day      2     day
2      3   4 week      4    week
3      4  8 month      8   month
Run Code Online (Sandbox Code Playgroud)

RegEx 解释- regex101.com 是 IMO 最好的在线 RegEx 解析器、测试器和解释器之一

您可能还想将number列转换为整数 dtype:

In [69]: df['number'] = df['number'].astype(int)

In [70]: df.dtypes
Out[70]:
index        int64
duration    object
number       int32
time        object
dtype: object
Run Code Online (Sandbox Code Playgroud)

更新:

In [167]: df['time_day'] = df['time'].replace(['year','month','week','day'], [365, 30, 7, 1], regex=True)

In [168]: df
Out[168]:
   index duration number    time  time_day
0      1   7 year      7    year       365
1      2     2day      2     day         1
2      3   4 week      4    week         7
3      4  8 month      8   month        30
Run Code Online (Sandbox Code Playgroud)