我正在尝试从pandas数据帧中的DF中提取字符串,并且源字符串在必须与之匹配的列表中。我尝试使用a df.str.extract(list1)但我收到了无法散列类型的错误,我想我将列表与DF比较的方式不正确
从
Col 1 Col 2
1 The date
2 Three has come
3 Mail Sent
4 Done Deal
Run Code Online (Sandbox Code Playgroud)
至
Col 1 Col 2 Col 3
1 The date NaN
2 Three has come Three has
3 Mail Sent Mail
4 Done Deal Done
Run Code Online (Sandbox Code Playgroud)
我的清单如下
List1 = ['Three has' , 'Mail' , 'Done' , 'Game' , 'Time has come']
Run Code Online (Sandbox Code Playgroud)
你可以用extract与join在所有值List通过|什么手段or在regex:
List1 = ['Three has' , 'Mail' , 'Done' , 'Game' , 'Time has come']
df['Col 3'] = df['Col 2'].str.extract("(" + "|".join(List1) +")", expand=False)
print (df)
Col 1 Col 2 Col 3
0 1 The date NaN
1 2 Three has come Three has
2 3 Mail Sent Mail
3 4 Done Deal Done
Run Code Online (Sandbox Code Playgroud)
另一个解决方案:
List1 = ['Three has' , 'Mail' , 'Done' , 'Game' , 'Time has come']
df['Col 3'] = df['Col 2'].apply(lambda x: ''.join([L for L in List1 if L in x]))
df['Col 3'] = df['Col 3'].mask(df['Col 3'] == '')
print (df)
Col 1 Col 2 Col 3
0 1 The date NaN
1 2 Three has come Three has
2 3 Mail Sent Mail
3 4 Done Deal Done
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
617 次 |
| 最近记录: |