Pandas：使用正则表达式从列中选择行

Question

Pandas：使用正则表达式从列中选择行

feccandid我想从以 H 或 S 作为第一个值的列中提取行：

    cid     amount  date    catcode     feccandid
0   N00031317   1000    2010    B2000   H0FL19080
1   N00027464   5000    2009    B1000   H6IA01098
2   N00024875   1000    2009    A5200   S2IL08088
3   N00030957   2000    2010    J2200   S0TN04195
4   N00026591   1000    2009    F3300   S4KY06072
5   N00031317   1000    2010    B2000   P0FL19080
6   N00027464   5000    2009    B1000   P6IA01098
7   N00024875   1000    2009    A5200   S2IL08088
8   N00030957   2000    2010    J2200   H0TN04195
9   N00026591   1000    2009    F3300   H4KY06072

Run Code Online (Sandbox Code Playgroud)

我正在使用这段代码：

campaign_contributions.loc[campaign_contributions['feccandid'].astype(str).str.extractall(r'^(?:S|H)')]

Run Code Online (Sandbox Code Playgroud)

返回错误： ValueError: pattern contains no capture groups

有使用正则表达式经验的人知道我做错了什么吗？

Answer 1

小智 5

为什么不直接使用str.match而不是提取和否定呢？

IE df[df['col'].str.match(r'^(S|H)')]

（我来这里寻找相同的答案，但 extract 的使用似乎很奇怪，所以我找到了str.ops.

瓦

Answer 2

Ami*_*ory 2

对于这么简单的事情，您可以绕过正则表达式：

relevant = campaign_contributions.feccandid.str.startswith('H') | \
    campaign_contributions.feccandid.str.startswith('S')
campaign_contributions[relevant]

Run Code Online (Sandbox Code Playgroud)

但是，如果您想使用正则表达式，可以将其更改为

relevant = ~campaign_contributions['feccandid'].str.extract(r'^(S|H)').isnull()

Run Code Online (Sandbox Code Playgroud)

请注意，astype是多余的，这就extract足够了。

归档时间：	9 年，9 月前
查看次数：	4296 次
最近记录：	7 年，5 月前