从数据框中提取字符串并与列表进行比较

yas*_*med 1 python pandas

我正在尝试从pandas数据帧中的DF中提取字符串,并且源字符串在必须与之匹配的列表中。我尝试使用a df.str.extract(list1)但我收到了无法散列类型的错误,我想我将列表与DF比较的方式不正确

Col 1   Col 2
1       The date
2       Three has come
3       Mail Sent
4       Done Deal
Run Code Online (Sandbox Code Playgroud)

Col 1   Col 2           Col 3 
1       The date        NaN
2       Three has come  Three has
3       Mail Sent        Mail
4       Done Deal        Done
Run Code Online (Sandbox Code Playgroud)

我的清单如下

List1 = ['Three has' , 'Mail' , 'Done' , 'Game' , 'Time has come']
Run Code Online (Sandbox Code Playgroud)

jez*_*ael 5

你可以用extractjoin在所有值List通过|什么手段orregex

List1 = ['Three has' , 'Mail' , 'Done' , 'Game' , 'Time has come']
df['Col 3'] = df['Col 2'].str.extract("(" + "|".join(List1) +")", expand=False)
print (df)
   Col 1           Col 2      Col 3
0      1        The date        NaN
1      2  Three has come  Three has
2      3       Mail Sent       Mail
3      4       Done Deal       Done
Run Code Online (Sandbox Code Playgroud)

另一个解决方案:

List1 = ['Three has' , 'Mail' , 'Done' , 'Game' , 'Time has come']

df['Col 3'] = df['Col 2'].apply(lambda x: ''.join([L for L in List1 if L in x]))
df['Col 3'] = df['Col 3'].mask(df['Col 3'] == '')
print (df)
   Col 1           Col 2      Col 3
0      1        The date        NaN
1      2  Three has come  Three has
2      3       Mail Sent       Mail
3      4       Done Deal       Done
Run Code Online (Sandbox Code Playgroud)