alv*_*vas 10 python string punctuation digit
我只需要过滤掉只包含数字和/或一组标点符号的字符串.
我已经尝试检查每个字符,然后总结布尔条件以检查它是否等于len(str).是否有更多pythonic方式来做到这一点:
>>> import string
>>> x = ['12,523', '3.46', "this is not", "foo bar 42", "23fa"]
>>> [i for i in x if [True if j.isdigit() else False for j in i] ]
['12,523', '3.46', 'this is not', 'foo bar 42']
>>> [i for i in x if sum([True if j.isdigit() or j in string.punctuation else False for j in i]) == len(i)]
['12,523', '3.46']
Run Code Online (Sandbox Code Playgroud)
使用all生成器表达式,您不需要计算,比较长度:
>>> [i for i in x if all(j.isdigit() or j in string.punctuation for j in i)]
['12,523', '3.46']
Run Code Online (Sandbox Code Playgroud)
BTW,上面和OP的代码将包括仅包含标点符号的字符串.
>>> x = [',,,', '...', '123', 'not number']
>>> [i for i in x if all(j.isdigit() or j in string.punctuation for j in i)]
[',,,', '...', '123']
Run Code Online (Sandbox Code Playgroud)
要处理这个问题,请添加更多条件:
>>> [i for i in x if all(j.isdigit() or j in string.punctuation for j in i) and any(j.isdigit() for j in i)]
['123']
Run Code Online (Sandbox Code Playgroud)
通过将string.punctuation的结果存储在集合中,可以使其更快一些.
>>> puncs = set(string.punctuation)
>>> [i for i in x if all(j.isdigit() or j in puncs for j in i) and any(j.isdigit() for j in i)]
['123']
Run Code Online (Sandbox Code Playgroud)
您可以使用预编译的正则表达式来检查这一点。
import re, string
pattern = re.compile("[\d{}]+$".format(re.escape(string.punctuation)))
x = ['12,523', '3.46', "this is not", "foo bar 42", "23fa"]
print [item for item in x if pattern.match(item)]
Run Code Online (Sandbox Code Playgroud)
输出
['12,523', '3.46']
Run Code Online (Sandbox Code Playgroud)
@falsetru 的解决方案和我的解决方案之间的一些时间比较
import re, string
punct = string.punctuation
pattern = re.compile("[\d{}]+$".format(re.escape(string.punctuation)))
x = ['12,523', '3.46', "this is not", "foo bar 42", "23fa"]
from timeit import timeit
print timeit("[item for item in x if pattern.match(item)]", "from __main__ import pattern, x")
print timeit("[i for i in x if all(j.isdigit() or j in punct for j in i)]", "from __main__ import x, punct")
Run Code Online (Sandbox Code Playgroud)
在我的机器上输出
2.03506183624
4.28856396675
Run Code Online (Sandbox Code Playgroud)
因此,预编译的 RegEx 方法是allandany方法的两倍。