我有一个 df 看起来像这样:
words col_a col_b
I guess, because I have thought over that. Um, 1 0
That? yeah. 1 1
I don't always think you're up to something. 0 1
Run Code Online (Sandbox Code Playgroud)
我想将 df.words 存在标点符号的地方拆分(.,?!:;)为单独的行。但是,我想为每个新行保留原始行中的 col_b 和 col_b 值。例如,上面的 df 应该是这样的:
words col_a col_b
I guess, 1 0
because I have thought over that. 1 0
Um, 1 0
That? 1 1
yeah. 1 1
I don't always think you're up to something. 0 1
Run Code Online (Sandbox Code Playgroud)
一种方法是使用str.findall模式(.*?[.,?!:;])来匹配任何这些标点符号和它前面的字符(非贪婪),并分解结果列表:
(df.assign(words=df.words.str.findall(r'(.*?[.,?!:;])'))
.explode('words')
.reset_index(drop=True))
words col_a col_b
0 I guess, 1 0
1 because I have thought over that. 1 0
2 Um, 1 0
3 That? 1 1
4 yeah. 1 1
5 I don't always think you're up to something. 0 1
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
53 次 |
| 最近记录: |