如何在pandas过滤函数中反转正则表达式

Question

如何在pandas过滤函数中反转正则表达式

我有以下pandas数据帧df(实际上只是更大的数据帧的最后一行):

                           count
gene                            
WBGene00236788                56
WBGene00236807                 3
WBGene00249816                12
WBGene00249825                20
WBGene00255543                 6
__no_feature            11697881
__ambiguous                 1353
__too_low_aQual                0
__not_aligned                  0
__alignment_not_unique         0

Run Code Online (Sandbox Code Playgroud)

我可以使用filter's regex选项只获得以两个下划线开头的行:

df.filter(regex="^__", axis=0)

Run Code Online (Sandbox Code Playgroud)

这将返回以下内容:

                           count
gene                            
__no_feature            11697881
__ambiguous                 1353
__too_low_aQual                0
__not_aligned                  0
__alignment_not_unique         0

Run Code Online (Sandbox Code Playgroud)

实际上,我想要的是补充:只有那些不以两个下划线开头的行.

我可以用另一个正则表达式做到:df.filter(regex="^[^_][^_]", axis=0).

有没有办法更简单地指定我想要初始正则表达式的倒数？

这种基于正则表达式的过滤是否有效？

编辑:测试一些建议的解决方案

df.filter(regex="(?!^__)", axis=0)并且df.filter(regex="^\w+", axis=0)都返回所有行.

根据re模块文档,\w特殊字符实际上包括下划线,它解释了第二个表达式的行为.

我想第一个不起作用,因为它(?!...)适用于模式之后的内容.这里,"^"应该放在外面,如下面提出的解决方案:

df.filter(regex="^(?!__).*?$", axis=0) 作品.

那样做df.filter(regex="^(?!__)", axis=0).

Answer 1

小智 13

我有同样的问题，但我想过滤列。因此我使用的是 axis=1 但概念应该是相似的。

df.drop(df.filter(regex='my_expression').columns,axis=1)

Run Code Online (Sandbox Code Playgroud)

Answer 2

Rob*_*och 6

匹配没有两个前导下划线的所有行：

^(?!__)

^匹配行首 (?!__)确保该行（前一个^匹配之后的内容）不以两个下划线开头

编辑： 删除了.*?$因为没有必要过滤行。

归档时间：	9 年，2 月前
查看次数：	2369 次
最近记录：	9 年，1 月前