在 Python 中的列表中创建列表

Question

在 Python 中的列表中创建列表

我有一个名为 values 的列表，其中包含一系列数字：

values = [0, 1, 2, 3, 4, 5, ... , 351, 0, 1, 2, 3, 4, 5, 6, ... , 750, 0, 1, 2, 3, 4, 5, ... , 559]

Run Code Online (Sandbox Code Playgroud)

我想创建一个新列表，其中包含从 0 到一个数字的元素列表。

喜欢：

new_values = [[0, 1, 2, ... , 351], [0, 1, 2, ... , 750], [0, 1, 2, ... , 559]]

Run Code Online (Sandbox Code Playgroud)

我所做的代码是这样的：

start = 0
new_values = []
for i,val in enumerate(values): 
    if(val == 0):
        new_values.append(values[start:i]) 
        start = i

Run Code Online (Sandbox Code Playgroud)

但是，它返回的是：

new_values = [[], [0, 1, 2, ... , 750], [0, 1, 2, ... , 559]]

Run Code Online (Sandbox Code Playgroud)

我该如何修复我的代码？这真的会是一个很大的帮助。

Answer 1

Sha*_*ger 5

所以你写的代码的问题是它list在开头包含一个空，并省略了最后的 sub- list。对此的极简修复是：

更改测试以避免附加第一个list（当i为 0 时），例如if val == 0 and i != 0:
循环退出后追加最后一组

结合这两个修复程序，您将拥有：

start = 0
new_values = []
for i,val in enumerate(values): 
    if val == 0 and i != 0:  # Avoid adding empty list
        new_values.append(values[start:i]) 
        start = i
if values:  # Handle edgecase for empty values where nothing to add
    new_values.append(values[start:])  # Add final list

Run Code Online (Sandbox Code Playgroud)

我打算添加更清洁的groupby解决方案，以避免开始/结束的特殊情况list，但Chris_Rands 已经处理了这个问题，所以我会向你推荐他的回答。

有点令人惊讶的是，这实际上似乎是最快的解决方案，渐近地，以要求输入为 a 为代价list（其中一些其他解决方案可以接受任意迭代，包括无法对其进行索引的纯迭代器）。

为了比较（为了简洁和在现代 Python 上获得最佳性能，使用 Python 3.5 附加解包概括，并使用隐式布尔值int来避免比较，0因为它与int输入等效，但使用隐式布尔值更有意义）：

from itertools import *

# truth is the same as bool, but unlike the bool constructor, it requires
# exactly one positional argument, which makes a *major* difference
# on runtime when it's in a hot code path
from operator import truth

def method1(values):
    # Optimized/correct OP's code
    # Only works on list inputs, and requires non-empty values to begin with 0,
    # but handles repeated 0s as separate groups properly
    new_values = []
    start = None
    for i, val in enumerate(values):
        if not val and i:
            new_values.append(values[start:i])
            start = i
    if values:
        new_values.append(values[start:])
    return new_values

def method2(values):
    # Works with arbitrary iterables and iterators, but doesn't handle
    # repeated 0s or non-empty values that don't begin with 0
    return [[0, *g] for k, g in groupby(values, truth) if k]

def method3(values):
    # Same behaviors and limitations as method1, but without verbose
    # special casing for begin and end
    start_indices = [i for i, val in enumerate(values) if not val]

    # End indices for all but terminal slice are previous start index
    # so make iterator and discard first value to pair properly
    end_indices = iter(start_indices)
    next(end_indices, None)

    # Pairing with zip_longest avoids need to explicitly pad end_indices
    return [values[s:e] for s, e in zip_longest(start_indices, end_indices)]

def method4(values):
    # Requires any non-empty values to begin with 0
    # but otherwise handles runs of 0s and arbitrary iterables (including iterators)
    new_values = []
    for val in values:
        if not val:
            curlist = [val]
            new_values.append(curlist)
            # Use pre-bound method in local name for speed
            curlist_append = curlist.append
        else:
            curlist_append(val)
    return new_values

def method5(values):
    # Most flexible solution; similar to method2, but handles all inputs, empty, non-empty,
    # with or without leading 0, with or without runs of repeated 0s
    new_values = []
    for nonzero, grp in groupby(values, truth):
        if nonzero:
            try:
                new_values[-1] += grp
            except IndexError:
                new_values.append([*grp])  # Only happens when values begins with nonzero
        else:
            new_values += [[0] for _ in grp]
    return new_values

Run Code Online (Sandbox Code Playgroud)

Python 3.6、Linux x64 上的计时，使用ipython6.1 的%timeit魔法：

>>> values = [*range(100), *range(50), *range(150)]
>>> %timeit -r5 method1(values)
12.5 ?s ± 50.6 ns per loop (mean ± std. dev. of 5 runs, 100000 loops each)

>>> %timeit -r5 method2(values)
16.9 ?s ± 54.9 ns per loop (mean ± std. dev. of 5 runs, 100000 loops each)

>>> %timeit -r5 method3(values)
13 ?s ± 18.9 ns per loop (mean ± std. dev. of 5 runs, 100000 loops each)

>>> %timeit -r5 method4(values)
16.7 ?s ± 9.51 ns per loop (mean ± std. dev. of 5 runs, 100000 loops each)

>>> %timeit -r5 method5(values)
18.2 ?s ± 25.2 ns per loop (mean ± std. dev. of 5 runs, 100000 loops each)

Run Code Online (Sandbox Code Playgroud)

概括：

批量切出运行( method1, method3) 的解决方案是最快的，但取决于输入是一个序列（如果返回类型必须是list，则输入也必须是list，或者必须添加转换）。

groupby解决方案( method2, method5)稍慢，但通常非常简洁（处理所有边缘情况，method5不需要极端冗长，也不需要明确的测试和检查 LBYL 模式）。他们也不需要大量的两轮牛车的让他们去尽可能快的，除了使用operator.truth的替代bool。这是必要的，因为 CPython 的bool构造函数非常慢，这要归功于一些奇怪的实现细节（bool必须接受完整的可变参数，包括关键字，通过对象构造机制进行调度，这比operator.truth使用仅采用一个位置参数的低开销路径和绕过对象建筑机械）；如果bool被用作key功能代替operator.truth，运行时间的两倍以上（36.8≠s和38.8？的用于method2和method5分别地）。

介于两者之间的是更慢但更灵活的方法（处理任意输入可迭代对象，包括迭代器，处理没有特殊大小写的 0 的运行等），使用逐项appends ( method4)。问题是，获得最大性能需要更多冗长的代码（因为需要避免重复索引和方法绑定）；如果循环method4更改为更简洁：

for val in values:
    if not val:
        new_values.append([])
    new_values[-1].append(val)

Run Code Online (Sandbox Code Playgroud)

由于反复索引new_values和绑定append方法的成本，运行时间增加了一倍多（到~34.4 ?s）。

无论如何，就个人而言，如果性能不是绝对关键，我会使用其中一种groupby解决方案 using boolas thekey just 来避免导入和不常见的 API。如果性能更重要，我可能仍会使用groupby，但将其operator.truth作为key函数交换；当然，它不像拼写版本那么快，但对于了解的人来说groupby，它很容易理解，并且对于任何给定的边缘情况处理级别来说，它通常是最简洁的解决方案。

归档时间：	8 年，2 月前
查看次数：	488 次
最近记录：	8 年，2 月前