根据字符串部分匹配内部合并两个 DataFrame

Dar*_*lus 4 python dataframe pandas

我们有以下两个数据框

temp = pd.DataFrame(np.array([['I am feeling very well',1],['It is hard to believe this happened',0],
                                  ['What is love?',1], ['No new friends',0],
                             ['I love this show',1],['Amazing day today',1]]),
                                columns = ['message','sentiment'])

temp_truncated = pd.DataFrame(np.array([['I am feeling very',1],['It is hard to believe',1],
                                  ['What is',1], ['Amazing day',1]]),
                                columns = ['message','cutoff'])
Run Code Online (Sandbox Code Playgroud)

我的想法是创建第三个 DataFrame 来表示 之间的内部联接temp,并temp_truncated通过查找以 / 开头的匹配项temp来包含以下字符串temp_truncated

期望的输出:

     message                             sentiment   cutoff            
0    I am feeling very well               1          1
1    It is hard to believe this happened  0          1
2    What is love                         1          1
3    Amazing day today                    1          1
Run Code Online (Sandbox Code Playgroud)

moz*_*way 5

您可以使用:

import re
pattern = '|'.join(map(re.escape, temp_truncated['message']))

key = temp['message'].str.extract(f'({pattern})', expand=False)

out = (temp
 .merge(temp_truncated.rename(columns={'message': 'sub'}),
        left_on=key, right_on='sub')
 .drop(columns='sub')
)
Run Code Online (Sandbox Code Playgroud)

输出:

                               message sentiment cutoff
0               I am feeling very well         1      1
1  It is hard to believe this happened         0      1
2                        What is love?         1      1
3                    Amazing day today         1      1
Run Code Online (Sandbox Code Playgroud)