alv*_*vas 16 python reduce dictionary filter nested-loops
我有一系列for循环,它们在原始的字符串列表上工作,然后逐渐过滤列表,例如:
import re
# Regex to check that a cap exist in string.
pattern1 = re.compile(r'\d.*?[A-Z].*?[a-z]')
vocab = ['dog', 'lazy', 'the', 'fly'] # Imagine it's a longer list.
def check_no_caps(s):
return None if re.match(pattern1, s) else s
def check_nomorethan_five(s):
return s if len(s) <= 5 else None
def check_in_vocab_plus_x(s,x):
# s and x are both str.
return None if s not in vocab else s+x
slist = ['the', 'dog', 'jumps', 'over', 'the', 'fly']
# filter with check_no_caps
slist = [check_no_caps(s) for s in slist]
# filter no more than 5.
slist = [check_nomorethan_five(s) for s in slist if s is not None]
# filter in vocab
slist = [check_in_vocab_plus_x(s, str(i)) for i,s in enumerate(slist) if s is not None]
Run Code Online (Sandbox Code Playgroud)
以上只是一个例子,实际上我操作字符串的函数更复杂,但它们确实返回原始字符串或操作字符串.
我可以使用生成器而不是列表,并执行以下操作:
slist = ['the', 'dog', 'jumps', 'over', 'the', 'fly']
# filter with check_no_caps and no more than 5.
slist = (s2 check_no_caps(s1) for s1 in slist
for s2 in check_nomorethan_five(s1) if s1)
# filter in vocab
slist = [check_in_vocab_plus_x(s, str(i)) for i,s in enumerate(slist) if s is not None]
Run Code Online (Sandbox Code Playgroud)
或者在一个疯狂的嵌套生成器中:
slist = ['the', 'dog', 'jumps', 'over', 'the', 'fly']
slist = (s3 check_no_caps(s1) for s1 in slist
for s2 in check_nomorethan_five(s1) if s1
for s3 in check_in_vocab_plus_x(s2, str(i)) if s2)
Run Code Online (Sandbox Code Playgroud)
肯定有更好的办法.有没有办法让for循环链变得更快?
有没有办法做到map,reduce和filter?会更快吗?
想象一下,我原来的slist非常非常大,就像数十亿.而且我的函数不像上面的函数那么简单,它们进行一些计算并且每秒执行大约1,000次调用.
首先是你对字符串的整个过程.您正在使用一些字符串,并且每个字符串都应用某些功能.然后清理列表.让我们说一段时间,你应用于字符串的所有函数都在一个恒定的时间工作(这不是真的,但是现在它并不重要).在您的解决方案中,您使用一个函数(即O(N))迭代throgh列表.然后你接下一个函数并再次迭代(另一个O(N)),依此类推.因此,加速的显而易见的方法是减少循环次数.这并不困难.
接下来要做的是尝试优化您的功能.例如,你使用regexp来检查字符串是否有大写字母,但是有str.islower(如果字符串中的所有套接字符都是小写且至少有一个套接字符,则返回true,否则返回false).
因此,这是第一次简化和加速代码的尝试:
vocab = ['dog', 'lazy', 'the', 'fly'] # Imagine it's a longer list.
# note that first two functions can be combined in one
def no_caps_and_length(s):
return s if s.islower() and len(s)<=5 else None
# this one is more complicated and cannot be merged with first two
# (not really, but as you say, some functions are rather complicated)
def check_in_vocab_plus_x(s,x):
# s and x are both str.
return None if s not in vocab else s+x
# now let's introduce a function that would pipe a string through all functions you need
def pipe_through_funcs(s):
# yeah, here we have only two, but could be more
funcs = [no_caps_and_length, check_in_vocab_plus_x]
for func in funcs:
if s == None: return s
s = func(s)
return s
slist = ['the', 'dog', 'jumps', 'over', 'the', 'fly']
# final step:
slist = filter(lambda a: a!=None, map(pipe_through_funcs, slist))
Run Code Online (Sandbox Code Playgroud)
可能还有一件事可以改进.目前,您遍历列表修改元素,然后将其过滤掉.但是如果过滤然后修改可能会更快.像这样:
vocab = ['dog', 'lazy', 'the', 'fly'] # Imagine it's a longer list.
# make a function that does all the checks for filtering
# you can make a big expression and return its result,
# or a sequence of ifs, or anything in-between,
# it won't affect performance,
# but make sure you put cheaper checks first
def my_filter(s):
if len(s)>5: return False
if not s.islower(): return False
if s not in vocab: return False
# maybe more checks here
return True
# now we need modifying function
# there is a concern: if you need indices as they were in original list
# you might need to think of some way to pass them here
# as you iterate through filtered out list
def modify(s,x):
s += x
# maybe more actions
return s
slist = ['the', 'dog', 'jumps', 'over', 'the', 'fly']
# final step:
slist = map(modify, filter(my_filter, slist))
Run Code Online (Sandbox Code Playgroud)
另请注意,在某些情况下,生成器,地图和事物可以更快,但并非总是如此.我相信,如果你过滤掉的项目数量很大,那么使用附加的for循环可能会更快.我不会保证它会更快但你可以尝试这样的事情:
initial_list = ['the', 'dog', 'jumps', 'over', 'the', 'fly']
new_list = []
for s in initial_list:
processed = pipe_through_funcs(s)
if processed != None: new_list.append(processed)
Run Code Online (Sandbox Code Playgroud)
如果你使你的转换函数统一,那么你可以这样做:
import random
slist = []
for i in range(0,100):
slist.append(random.randint(0,1000))
# Unified functions which have the same function description
# x is the value
# i is the counter from enumerate
def add(x, i):
return x + 2
def replace(x, i):
return int(str(x).replace('2', str(i)))
# Specifying your pipelines as a list of tuples
# Where tuple is (filter function, transformer function)
_pipeline = [
(lambda s: True, add),
(lambda s: s % 2 == 0, replace),
]
# Execute your pipeline
for _filter, _fn in _pipeline:
slist = map(lambda item: _fn(*item), enumerate(filter(_filter, slist)))
Run Code Online (Sandbox Code Playgroud)
该代码适用于 python 2 和 python 3。不同之处在于,在 Python3 中,所有内容都返回一个生成器,因此只有在必要时才执行它。因此,您将有效地对您的列表进行一次迭代。
print(slist)
<map object at 0x7f92b8315fd0>
Run Code Online (Sandbox Code Playgroud)
然而,只要可以在内存中完成,迭代一次或多次就不会有太大区别,因为无论采用哪种迭代方法,都必须执行相同数量的转换和过滤。因此,为了改进您的代码,请尝试使您的过滤和转换函数尽可能快。
例如,@Rawing 提到的作为集合而不是列表进行调用将会产生很大的差异,尤其是对于大量项目。
| 归档时间: |
|
| 查看次数: |
951 次 |
| 最近记录: |