过滤掉包含特殊字符的 pandas 数据帧行的最快方法

mua*_*aiz 4 python dataframe pandas

我有一个特殊字符列表。例如

BAD_CHARS = ['.', '&', '\(', '\)', ';', '-']
Run Code Online (Sandbox Code Playgroud)

我想从 pandas 数据帧列中删除包含这些特殊字符的所有行。目前我正在做以下事情

df = '''
        words  frequency
            &         11
    CONDUCTED          3
       (E.G.,          5
   EXPERIMENT          6
         (VS.          5
        (WARD          3
            -         14
        2006;          3
           3D          5
         ABLE          5
     ABSTRACT          3
  ACCOMPANIED          5
     ACTIVITY         11
           AD          5
       ADULTS          6
'''
for char in BAD_CHARS:
    df = df[~df['word'].str.contains(char)]

# Expected Result
        words  frequency
    CONDUCTED          3
   EXPERIMENT          6
           3D          5
         ABLE          5
     ABSTRACT          3
  ACCOMPANIED          5
     ACTIVITY         11
           AD          5
       ADULTS          6
Run Code Online (Sandbox Code Playgroud)

首先它不起作用,其次我猜它不快。那么我怎样才能以更快的方式做到这一点呢?谢谢

jez*_*ael 5

我相信你需要首先转义值,然后加入|@c\xe1\xb4\x8f\xca\x9f\xe1\xb4\x85s\xe1\xb4\x98\xe1\xb4\x87\xe1\xb4\x87\ xe1\xb4\x85 指向\\从以下值中删除BAD_CHARS

\n\n
import re\n\nBAD_CHARS = ['.', '&', '(', ')', ';', '-']\npat = '|'.join(['({})'.format(re.escape(c)) for c in BAD_CHARS])\n\ndf = df[~df['words'].str.contains(pat)]\nprint (df)\n          words  frequency\n1     CONDUCTED          3\n3    EXPERIMENT          6\n8            3D          5\n9          ABLE          5\n10     ABSTRACT          3\n11  ACCOMPANIED          5\n12     ACTIVITY         11\n13           AD          5\n14       ADULTS          6\n
Run Code Online (Sandbox Code Playgroud)\n\n

因为这会返回空帧:

\n\n
df[~df['word'].str.contains('|'.join(BAD_CHARS))]\n
Run Code Online (Sandbox Code Playgroud)\n