我有一个包含一列布尔表达式的数据框,我想创建另一列,它只是每个表达式的元素列表。
前任
Name Exp
A DDDD | LLLL & AAAA
D HHHH | DDDD | JJJJ
O UUUU & FFFF & RRRR
Run Code Online (Sandbox Code Playgroud)
结果 df:
Name Exp Exp List
A DDDD | LLLL & AAAA ['DDDD','LLLL','AAAA']
D HHHH | DDDD | JJJJ ['HHHH','DDDD','JJJJ']
O UUUU & FFFF & RRRR ['UUUU','FFFF','RRRR']
Run Code Online (Sandbox Code Playgroud)
使用Series.str.findall正则表达式[a-zA-Z]+提取单词:
df['Exp List'] = df['Exp'].str.findall(r'[a-zA-Z]+')
#alternative
#df['Exp List'] = df['Exp'].str.findall(r'\w+')
print (df)
Name Exp Exp List
0 A DDDD | LLLL & AAAA [DDDD, LLLL, AAAA]
1 D HHHH | DDDD | JJJJ [HHHH, DDDD, JJJJ]
2 O UUUU & FFFF & RRRR [UUUU, FFFF, RRRR]
Run Code Online (Sandbox Code Playgroud)
Series.str.split带有可选空格的转义分隔符的解决方案是:
df['Exp List'] = df['Exp'].str.split(r'\s*\|\s*|\s*&\s*')
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
58 次 |
| 最近记录: |