Dar*_*lus 4 python dataframe pandas
我们有以下两个数据框
temp = pd.DataFrame(np.array([['I am feeling very well',1],['It is hard to believe this happened',0],
['What is love?',1], ['No new friends',0],
['I love this show',1],['Amazing day today',1]]),
columns = ['message','sentiment'])
temp_truncated = pd.DataFrame(np.array([['I am feeling very',1],['It is hard to believe',1],
['What is',1], ['Amazing day',1]]),
columns = ['message','cutoff'])
Run Code Online (Sandbox Code Playgroud)
我的想法是创建第三个 DataFrame 来表示 之间的内部联接temp,并temp_truncated通过查找以 / 开头的匹配项temp来包含以下字符串temp_truncated
期望的输出:
message sentiment cutoff
0 I am feeling very well 1 1
1 It is hard to believe this happened 0 1
2 What is love 1 1
3 Amazing day today 1 1
Run Code Online (Sandbox Code Playgroud)
您可以使用:
import re
pattern = '|'.join(map(re.escape, temp_truncated['message']))
key = temp['message'].str.extract(f'({pattern})', expand=False)
out = (temp
.merge(temp_truncated.rename(columns={'message': 'sub'}),
left_on=key, right_on='sub')
.drop(columns='sub')
)
Run Code Online (Sandbox Code Playgroud)
输出:
message sentiment cutoff
0 I am feeling very well 1 1
1 It is hard to believe this happened 0 1
2 What is love? 1 1
3 Amazing day today 1 1
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
75 次 |
| 最近记录: |