从美学角度和绩效角度来看,根据条件将项目列表拆分为多个列表的最佳方法是什么?相当于:
good = [x for x in mylist if x in goodvals]
bad = [x for x in mylist if x not in goodvals]
Run Code Online (Sandbox Code Playgroud)
有没有更优雅的方式来做到这一点?
更新:这是实际的用例,以便更好地解释我正在尝试做的事情:
# files looks like: [ ('file1.jpg', 33L, '.jpg'), ('file2.avi', 999L, '.avi'), ... ]
IMAGE_TYPES = ('.jpg','.jpeg','.gif','.bmp','.png')
images = [f for f in files if f[2].lower() in IMAGE_TYPES]
anims = [f for f in files if f[2].lower() not in IMAGE_TYPES]
Run Code Online (Sandbox Code Playgroud)
Joh*_*ooy 196
good, bad = [], []
for x in mylist:
(bad, good)[x in goodvals].append(x)
Run Code Online (Sandbox Code Playgroud)
dbr*_*dbr 103
Run Code Online (Sandbox Code Playgroud)good = [x for x in mylist if x in goodvals] bad = [x for x in mylist if x not in goodvals]有没有更优雅的方式来做到这一点?
该代码完全可读,非常清晰!
# files looks like: [ ('file1.jpg', 33L, '.jpg'), ('file2.avi', 999L, '.avi'), ... ]
IMAGE_TYPES = ('.jpg','.jpeg','.gif','.bmp','.png')
images = [f for f in files if f[2].lower() in IMAGE_TYPES]
anims = [f for f in files if f[2].lower() not in IMAGE_TYPES]
Run Code Online (Sandbox Code Playgroud)
再次,这很好!
使用集合可能会有轻微的性能改进,但这是一个微不足道的差异,我发现列表理解更容易阅读,并且您不必担心订单混乱,重复删除等等.
事实上,我可能会向后退一步,只需使用一个简单的for循环:
images, anims = [], []
for f in files:
if f.lower() in IMAGE_TYPES:
images.append(f)
else:
anims.append(f)
Run Code Online (Sandbox Code Playgroud)
列表理解或使用set()是好的,直到你需要添加一些其他检查或其他一些逻辑 - 比如说你要删除所有0字节jpeg,你只需要添加类似的东西......
if f[1] == 0:
continue
Run Code Online (Sandbox Code Playgroud)
Ant*_*sma 100
这是懒惰的迭代器方法:
from itertools import tee
def split_on_condition(seq, condition):
l1, l2 = tee((condition(item), item) for item in seq)
return (i for p, i in l1 if p), (i for p, i in l2 if not p)
Run Code Online (Sandbox Code Playgroud)
它每个项目评估一次条件并返回两个生成器,首先从条件为真的序列中产生值,另一个在假的情况下产生值.
因为它很懒,你可以在任何迭代器上使用它,甚至是无限的迭代器:
from itertools import count, islice
def is_prime(n):
return n > 1 and all(n % i for i in xrange(2, n))
primes, not_primes = split_on_condition(count(), is_prime)
print("First 10 primes", list(islice(primes, 10)))
print("First 10 non-primes", list(islice(not_primes, 10)))
Run Code Online (Sandbox Code Playgroud)
通常虽然非惰性列表返回方法更好:
def split_on_condition(seq, condition):
a, b = [], []
for item in seq:
(a if condition(item) else b).append(item)
return a, b
Run Code Online (Sandbox Code Playgroud)
编辑:对于您通过某个键将项目拆分到不同列表的更具体的用法,继承了一个通用函数:
DROP_VALUE = lambda _:_
def split_by_key(seq, resultmapping, keyfunc, default=DROP_VALUE):
"""Split a sequence into lists based on a key function.
seq - input sequence
resultmapping - a dictionary that maps from target lists to keys that go to that list
keyfunc - function to calculate the key of an input value
default - the target where items that don't have a corresponding key go, by default they are dropped
"""
result_lists = dict((key, []) for key in resultmapping)
appenders = dict((key, result_lists[target].append) for target, keys in resultmapping.items() for key in keys)
if default is not DROP_VALUE:
result_lists.setdefault(default, [])
default_action = result_lists[default].append
else:
default_action = DROP_VALUE
for item in seq:
appenders.get(keyfunc(item), default_action)(item)
return result_lists
Run Code Online (Sandbox Code Playgroud)
用法:
def file_extension(f):
return f[2].lower()
split_files = split_by_key(files, {'images': IMAGE_TYPES}, keyfunc=file_extension, default='anims')
print split_files['images']
print split_files['anims']
Run Code Online (Sandbox Code Playgroud)
小智 24
所有提出的解决方案的问题在于它将扫描并应用过滤功能两次.我会做一个像这样的简单小函数:
def SplitIntoTwoLists(l, f):
a = []
b = []
for i in l:
if f(i):
a.append(i)
else:
b.append(i)
return (a,b)
Run Code Online (Sandbox Code Playgroud)
这样你就不会处理任何两次而且也不会重复代码.
sas*_*nin 19
我接受了.我提出了一个懒惰的单通partition函数,它保留了输出子序列中的相对顺序.
我认为要求是:
i)filter或groupby)split库我的partition函数(下面介绍)和其他类似的函数使它成为一个小型库:
它可以通过PyPI正常安装:
pip install --user split
Run Code Online (Sandbox Code Playgroud)
要根据条件拆分列表,请使用以下partition函数:
>>> from split import partition
>>> files = [ ('file1.jpg', 33L, '.jpg'), ('file2.avi', 999L, '.avi') ]
>>> image_types = ('.jpg','.jpeg','.gif','.bmp','.png')
>>> images, other = partition(lambda f: f[-1] in image_types, files)
>>> list(images)
[('file1.jpg', 33L, '.jpg')]
>>> list(other)
[('file2.avi', 999L, '.avi')]
Run Code Online (Sandbox Code Playgroud)
partition功能解释在内部我们需要同时构建两个子序列,因此只消耗一个输出序列将强制另一个输出序列也被计算.我们需要在用户请求之间保持状态(存储已处理但尚未请求的元素).为了保持状态,我使用两个双端队列(deques):
from collections import deque
Run Code Online (Sandbox Code Playgroud)
SplitSeq 上课照顾家务:
class SplitSeq:
def __init__(self, condition, sequence):
self.cond = condition
self.goods = deque([])
self.bads = deque([])
self.seq = iter(sequence)
Run Code Online (Sandbox Code Playgroud)
魔术发生在它的.getNext()方法中.它几乎.next()
与迭代器类似,但允许指定我们想要的那种元素.在场景后面它不会丢弃被拒绝的元素,而是将它们放在两个队列中的一个中:
def getNext(self, getGood=True):
if getGood:
these, those, cond = self.goods, self.bads, self.cond
else:
these, those, cond = self.bads, self.goods, lambda x: not self.cond(x)
if these:
return these.popleft()
else:
while 1: # exit on StopIteration
n = self.seq.next()
if cond(n):
return n
else:
those.append(n)
Run Code Online (Sandbox Code Playgroud)
最终用户应该使用partition功能.它需要一个条件函数和一个序列(就像map或filter),并返回两个生成器.第一个生成器构建条件成立的元素的子序列,第二个构建互补子序列.迭代器和生成器允许甚至长或无限序列的惰性分裂.
def partition(condition, sequence):
cond = condition if condition else bool # evaluate as bool if condition == None
ss = SplitSeq(cond, sequence)
def goods():
while 1:
yield ss.getNext(getGood=True)
def bads():
while 1:
yield ss.getNext(getGood=False)
return goods(), bads()
Run Code Online (Sandbox Code Playgroud)
我选择了测试功能,是促进在未来部分应用程序的第一个参数(类似于如何map和filter
有测试功能作为第一个参数).
小智 14
我基本上喜欢安德斯的方法,因为它很一般.这是一个将分类程序放在第一位(匹配过滤器语法)并使用defaultdict(假定已导入)的版本.
def categorize(func, seq):
"""Return mapping from categories to lists
of categorized items.
"""
d = defaultdict(list)
for item in seq:
d[func(item)].append(item)
return d
Run Code Online (Sandbox Code Playgroud)
Ric*_*dle 13
先去(OP前编辑):使用集合:
mylist = [1,2,3,4,5,6,7]
goodvals = [1,3,7,8,9]
myset = set(mylist)
goodset = set(goodvals)
print list(myset.intersection(goodset)) # [1, 3, 7]
print list(myset.difference(goodset)) # [2, 4, 5, 6]
Run Code Online (Sandbox Code Playgroud)
这对可读性(IMHO)和性能都有好处.
第二次去(后OP编辑):
创建一个好的扩展列表作为一组:
IMAGE_TYPES = set(['.jpg','.jpeg','.gif','.bmp','.png'])
Run Code Online (Sandbox Code Playgroud)
这将提高性能.否则,你看到的对我来说很好.
Joh*_*n D 10
good.append(x) if x in goodvals else bad.append(x)
Run Code Online (Sandbox Code Playgroud)
@dansalmo 的这个优雅而简洁的答案隐藏在评论中,所以我只是将它重新发布在这里作为答案,以便它能够获得应有的重视,尤其是对于新读者。
完整示例:
good, bad = [], []
for x in my_list:
good.append(x) if x in goodvals else bad.append(x)
Run Code Online (Sandbox Code Playgroud)
itertools.groupby几乎可以执行您想要的操作,但它需要对项目进行排序以确保您获得单个连续范围,因此您需要先按键排序(否则您将为每种类型获得多个交错组).例如.
def is_good(f):
return f[2].lower() in IMAGE_TYPES
files = [ ('file1.jpg', 33L, '.jpg'), ('file2.avi', 999L, '.avi'), ('file3.gif', 123L, '.gif')]
for key, group in itertools.groupby(sorted(files, key=is_good), key=is_good):
print key, list(group)
Run Code Online (Sandbox Code Playgroud)
得到:
False [('file2.avi', 999L, '.avi')]
True [('file1.jpg', 33L, '.jpg'), ('file3.gif', 123L, '.gif')]
Run Code Online (Sandbox Code Playgroud)
与其他解决方案类似,可以将键功能定义为分成您想要的任意数量的组.
小智 8
bad = []
good = [x for x in mylist if x in goodvals or bad.append(x)]
Run Code Online (Sandbox Code Playgroud)
追加返回 None,所以它有效。
就个人而言,我喜欢你引用的版本,假设你已经有了一个goodvals悬挂列表.如果没有,比如:
good = filter(lambda x: is_good(x), mylist)
bad = filter(lambda x: not is_good(x), mylist)
Run Code Online (Sandbox Code Playgroud)
当然,这与使用像你最初的列表理解非常相似,但是使用函数而不是查找:
good = [x for x in mylist if is_good(x)]
bad = [x for x in mylist if not is_good(x)]
Run Code Online (Sandbox Code Playgroud)
总的来说,我发现列表理解的美学非常令人愉悦.当然,如果你实际上不需要保留排序而不需要重复,那么在集合上使用intersection和difference方法也可以很好地工作.
如果要以FP样式制作:
good, bad = [ sum(x, []) for x in zip(*(([y], []) if y in goodvals else ([], [y])
for y in mylist)) ]
Run Code Online (Sandbox Code Playgroud)
这不是最易读的解决方案,但是至少仅一次遍历mylist。
有时,看起来列表理解并不是最好的使用方法!
我根据人们对这个主题的回答做了一个小测试,在随机生成的列表上进行了测试。这是列表的生成(可能有更好的方法,但这不是重点):
good_list = ('.jpg','.jpeg','.gif','.bmp','.png')
import random
import string
my_origin_list = []
for i in xrange(10000):
fname = ''.join(random.choice(string.lowercase) for i in range(random.randrange(10)))
if random.getrandbits(1):
fext = random.choice(good_list)
else:
fext = "." + ''.join(random.choice(string.lowercase) for i in range(3))
my_origin_list.append((fname + fext, random.randrange(1000), fext))
Run Code Online (Sandbox Code Playgroud)
现在我们开始
# Parand
def f1():
return [e for e in my_origin_list if e[2] in good_list], [e for e in my_origin_list if not e[2] in good_list]
# dbr
def f2():
a, b = list(), list()
for e in my_origin_list:
if e[2] in good_list:
a.append(e)
else:
b.append(e)
return a, b
# John La Rooy
def f3():
a, b = list(), list()
for e in my_origin_list:
(b, a)[e[2] in good_list].append(e)
return a, b
# Ants Aasma
def f4():
l1, l2 = tee((e[2] in good_list, e) for e in my_origin_list)
return [i for p, i in l1 if p], [i for p, i in l2 if not p]
# My personal way to do
def f5():
a, b = zip(*[(e, None) if e[2] in good_list else (None, e) for e in my_origin_list])
return list(filter(None, a)), list(filter(None, b))
# BJ Homer
def f6():
return filter(lambda e: e[2] in good_list, my_origin_list), filter(lambda e: not e[2] in good_list, my_origin_list)
Run Code Online (Sandbox Code Playgroud)
f1 204/s -- -5% -14% -15% -20% -26%
f6 215/s 6% -- -9% -11% -16% -22%
f3 237/s 16% 10% -- -2% -7% -14%
f4 240/s 18% 12% 2% -- -6% -13%
f5 255/s 25% 18% 8% 6% -- -8%
f2 277/s 36% 29% 17% 15% 9% --
Run Code Online (Sandbox Code Playgroud)
受 DanSalmo 评论的启发,这里有一个简洁、优雅的解决方案,同时也是最快的解决方案之一。
good, bad = [], []
for item in my_list:
good.append(item) if item in set(goodvals) else bad.append(item)
Run Code Online (Sandbox Code Playgroud)
提示:goodvals变成一组可以让我们轻松提升速度。
为了获得最大速度,我们采用最快的答案并通过将 good_list 变成一个集合来增强它。仅此一项就为我们提供了 40% 以上的速度提升,我们最终得到的解决方案比最慢的解决方案快 5.5 倍以上,即使它仍然可读。
good_list_set = set(good_list) # 40%+ faster than a tuple.
good, bad = [], []
for item in my_origin_list:
if item in good_list_set:
good.append(item)
else:
bad.append(item)
Run Code Online (Sandbox Code Playgroud)
这是上一个答案的更简洁版本。
good_list_set = set(good_list) # 40%+ faster than a tuple.
good, bad = [], []
for item in my_origin_list:
out = good if item in good_list_set else bad
out.append(item)
Run Code Online (Sandbox Code Playgroud)
优雅可能有些主观,但一些可爱和巧妙的 Rube Goldberg 风格的解决方案非常令人担忧,不应在任何语言的生产代码中使用,更不用说本质上优雅的 python。
基准测试结果:
filter_BJHomer 80/s -- -3265% -5312% -5900% -6262% -7273% -7363% -8051% -8162% -8244%
zip_Funky 118/s 4848% -- -3040% -3913% -4450% -5951% -6085% -7106% -7271% -7393%
two_lst_tuple_JohnLaRoy 170/s 11332% 4367% -- -1254% -2026% -4182% -4375% -5842% -6079% -6254%
if_else_DBR 195/s 14392% 6428% 1434% -- -882% -3348% -3568% -5246% -5516% -5717%
two_lst_compr_Parand 213/s 16750% 8016% 2540% 967% -- -2705% -2946% -4786% -5083% -5303%
if_else_1_line_DanSalmo 292/s 26668% 14696% 7189% 5033% 3707% -- -331% -2853% -3260% -3562%
tuple_if_else 302/s 27923% 15542% 7778% 5548% 4177% 343% -- -2609% -3029% -3341%
set_1_line 409/s 41308% 24556% 14053% 11035% 9181% 3993% 3529% -- -569% -991%
set_shorter 434/s 44401% 26640% 15503% 12303% 10337% 4836% 4345% 603% -- -448%
set_if_else 454/s 46952% 28358% 16699% 13349% 11290% 5532% 5018% 1100% 469% --
Run Code Online (Sandbox Code Playgroud)
Python 3.7 的完整基准代码(从 FunkySayu 修改):
good_list = ['.jpg','.jpeg','.gif','.bmp','.png']
import random
import string
my_origin_list = []
for i in range(10000):
fname = ''.join(random.choice(string.ascii_lowercase) for i in range(random.randrange(10)))
if random.getrandbits(1):
fext = random.choice(list(good_list))
else:
fext = "." + ''.join(random.choice(string.ascii_lowercase) for i in range(3))
my_origin_list.append((fname + fext, random.randrange(1000), fext))
# Parand
def two_lst_compr_Parand(*_):
return [e for e in my_origin_list if e[2] in good_list], [e for e in my_origin_list if not e[2] in good_list]
# dbr
def if_else_DBR(*_):
a, b = list(), list()
for e in my_origin_list:
if e[2] in good_list:
a.append(e)
else:
b.append(e)
return a, b
# John La Rooy
def two_lst_tuple_JohnLaRoy(*_):
a, b = list(), list()
for e in my_origin_list:
(b, a)[e[2] in good_list].append(e)
return a, b
# # Ants Aasma
# def f4():
# l1, l2 = tee((e[2] in good_list, e) for e in my_origin_list)
# return [i for p, i in l1 if p], [i for p, i in l2 if not p]
# My personal way to do
def zip_Funky(*_):
a, b = zip(*[(e, None) if e[2] in good_list else (None, e) for e in my_origin_list])
return list(filter(None, a)), list(filter(None, b))
# BJ Homer
def filter_BJHomer(*_):
return list(filter(lambda e: e[2] in good_list, my_origin_list)), list(filter(lambda e: not e[2] in good_list, my_origin_list))
# ChaimG's answer; as a list.
def if_else_1_line_DanSalmo(*_):
good, bad = [], []
for e in my_origin_list:
_ = good.append(e) if e[2] in good_list else bad.append(e)
return good, bad
# ChaimG's answer; as a set.
def set_1_line(*_):
good_list_set = set(good_list)
good, bad = [], []
for e in my_origin_list:
_ = good.append(e) if e[2] in good_list_set else bad.append(e)
return good, bad
# ChaimG set and if else list.
def set_shorter(*_):
good_list_set = set(good_list)
good, bad = [], []
for e in my_origin_list:
out = good if e[2] in good_list_set else bad
out.append(e)
return good, bad
# ChaimG's best answer; if else as a set.
def set_if_else(*_):
good_list_set = set(good_list)
good, bad = [], []
for e in my_origin_list:
if e[2] in good_list_set:
good.append(e)
else:
bad.append(e)
return good, bad
# ChaimG's best answer; if else as a set.
def tuple_if_else(*_):
good_list_tuple = tuple(good_list)
good, bad = [], []
for e in my_origin_list:
if e[2] in good_list_tuple:
good.append(e)
else:
bad.append(e)
return good, bad
def cmpthese(n=0, functions=None):
results = {}
for func_name in functions:
args = ['%s(range(256))' % func_name, 'from __main__ import %s' % func_name]
t = Timer(*args)
results[func_name] = 1 / (t.timeit(number=n) / n) # passes/sec
functions_sorted = sorted(functions, key=results.__getitem__)
for f in functions_sorted:
diff = []
for func in functions_sorted:
if func == f:
diff.append("--")
else:
diff.append(f"{results[f]/results[func]*100 - 100:5.0%}")
diffs = " ".join(f'{x:>8s}' for x in diff)
print(f"{f:27s} \t{results[f]:,.0f}/s {diffs}")
if __name__=='__main__':
from timeit import Timer
cmpthese(1000, 'two_lst_compr_Parand if_else_DBR two_lst_tuple_JohnLaRoy zip_Funky filter_BJHomer if_else_1_line_DanSalmo set_1_line set_if_else tuple_if_else set_shorter'.split(" "))
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
144326 次 |
| 最近记录: |