mua*_*aiz 4 python dataframe pandas
我有一个特殊字符列表。例如
BAD_CHARS = ['.', '&', '\(', '\)', ';', '-']
Run Code Online (Sandbox Code Playgroud)
我想从 pandas 数据帧列中删除包含这些特殊字符的所有行。目前我正在做以下事情
df = '''
words frequency
& 11
CONDUCTED 3
(E.G., 5
EXPERIMENT 6
(VS. 5
(WARD 3
- 14
2006; 3
3D 5
ABLE 5
ABSTRACT 3
ACCOMPANIED 5
ACTIVITY 11
AD 5
ADULTS 6
'''
for char in BAD_CHARS:
df = df[~df['word'].str.contains(char)]
# Expected Result
words frequency
CONDUCTED 3
EXPERIMENT 6
3D 5
ABLE 5
ABSTRACT 3
ACCOMPANIED 5
ACTIVITY 11
AD 5
ADULTS 6
Run Code Online (Sandbox Code Playgroud)
首先它不起作用,其次我猜它不快。那么我怎样才能以更快的方式做到这一点呢?谢谢
我相信你需要首先转义值,然后加入|
@c\xe1\xb4\x8f\xca\x9f\xe1\xb4\x85s\xe1\xb4\x98\xe1\xb4\x87\xe1\xb4\x87\ xe1\xb4\x85 指向\\
从以下值中删除BAD_CHARS
:
import re\n\nBAD_CHARS = ['.', '&', '(', ')', ';', '-']\npat = '|'.join(['({})'.format(re.escape(c)) for c in BAD_CHARS])\n\ndf = df[~df['words'].str.contains(pat)]\nprint (df)\n words frequency\n1 CONDUCTED 3\n3 EXPERIMENT 6\n8 3D 5\n9 ABLE 5\n10 ABSTRACT 3\n11 ACCOMPANIED 5\n12 ACTIVITY 11\n13 AD 5\n14 ADULTS 6\n
Run Code Online (Sandbox Code Playgroud)\n\n因为这会返回空帧:
\n\ndf[~df['word'].str.contains('|'.join(BAD_CHARS))]\n
Run Code Online (Sandbox Code Playgroud)\n