use*_*556 1 python regex string list-comprehension
在python中,我想从列表中删除任何包含在所谓的"黑名单"中找到的子字符串的字符串.
例如,假设列表A如下:
A = [ 'cat', 'doXXXg', 'monkey', 'hoBBBrse', 'fish', 'snake']
Run Code Online (Sandbox Code Playgroud)
列表B是:
B = ['XXX', 'BBB']
Run Code Online (Sandbox Code Playgroud)
我怎么能得到列表C:
C = [ 'cat', 'monkey', 'fish', 'snake']
Run Code Online (Sandbox Code Playgroud)
我已经玩过各种正则表达式和列表推导的组合,但我似乎无法让它工作.
>>> A = [ 'cat', 'doXXXg', 'monkey', 'hoBBBrse', 'fish', 'snake']
>>> B = ['XXX', 'BBB']
Run Code Online (Sandbox Code Playgroud)
以下列表理解将起作用
>>> [word for word in A if not any(bad in word for bad in B)]
['cat', 'monkey', 'fish', 'snake']
Run Code Online (Sandbox Code Playgroud)
您可以将黑名单加入一个表达式:
import re
blacklist = re.compile('|'.join([re.escape(word) for word in B]))
Run Code Online (Sandbox Code Playgroud)
如果它们匹配则过滤掉单词:
C = [word for word in A if not blacklist.search(word)]
Run Code Online (Sandbox Code Playgroud)
在图案字被转义(以便.和其他元字符不被视为代替这样,但作为文字字符),并连接成一系列的|替代方法:
>>> '|'.join([re.escape(word) for word in B])
'XXX|BBB'
Run Code Online (Sandbox Code Playgroud)
演示:
>>> import re
>>> A = [ 'cat', 'doXXXg', 'monkey', 'hoBBBrse', 'fish', 'snake']
>>> B = ['XXX', 'BBB']
>>> blacklist = re.compile('|'.join([re.escape(word) for word in B]))
>>> [word for word in A if not blacklist.search(word)]
['cat', 'monkey', 'fish', 'snake']
Run Code Online (Sandbox Code Playgroud)
这应该优于任何明确的成员资格测试,特别是当黑名单中的单词数量增加时:
>>> import string, random, timeit
>>> def regex_filter(words, blacklist):
... [word for word in A if not blacklist.search(word)]
...
>>> def any_filter(words, blacklist):
... [word for word in A if not any(bad in word for bad in B)]
...
>>> words = [''.join([random.choice(string.letters) for _ in range(random.randint(3, 20))])
... for _ in range(1000)]
>>> blacklist = [''.join([random.choice(string.letters) for _ in range(random.randint(2, 5))])
... for _ in range(10)]
>>> timeit.timeit('any_filter(words, blacklist)', 'from __main__ import any_filter, words, blacklist', number=100000)
0.36232495307922363
>>> timeit.timeit('regex_filter(words, blacklist)', "from __main__ import re, regex_filter, words, blacklist; blacklist = re.compile('|'.join([re.escape(word) for word in blacklist]))", number=100000)
0.2499098777770996
Run Code Online (Sandbox Code Playgroud)
以上测试10个随机列入黑名单的短字(2 - 5个字符)对1000个随机字(3 - 20个字符长)的列表,正则表达式快约50%.
| 归档时间: |
|
| 查看次数: |
2429 次 |
| 最近记录: |