给定一个数据列表,我正在尝试创建一个新列表,其中位置的值i是从i原始列表中的位置开始的最长运行的长度.例如,给定
x_list = [1, 1, 2, 3, 3, 3]
Run Code Online (Sandbox Code Playgroud)
应该返回:
run_list = [2, 1, 1, 3, 2, 1]
Run Code Online (Sandbox Code Playgroud)
我的解决方案
freq_list = []
current = x_list[0]
count = 0
for num in x_list:
if num == current:
count += 1
else:
freq_list.append((current,count))
current = num
count = 1
freq_list.append((current,count))
run_list = []
for i in freq_list:
z = i[1]
while z > 0:
run_list.append(z)
z -= 1
Run Code Online (Sandbox Code Playgroud)
首先,我创建一个freq_list元组列表,其中每个元组的第一个元素是元素x_list,其中第二个元素是总运行的数量.
在这种情况下:
freq_list = [(1, 2), (2, 1), (3, 3)]
Run Code Online (Sandbox Code Playgroud)
有了这个,我创建一个新列表并附加适当的值.
但是,我想知道是否有更短的方式/另一种方式来做到这一点?
Ara*_*Fey 26
这是一个简单的解决方案,它向后迭代列表并在每次重复数字时递增计数器:
last_num = None
result = []
for num in reversed(x_list):
if num != last_num:
# if the number changed, reset the counter to 1
counter = 1
last_num = num
else:
# if the number is the same, increment the counter
counter += 1
result.append(counter)
# reverse the result
result = list(reversed(result))
Run Code Online (Sandbox Code Playgroud)
结果:
[2, 1, 1, 3, 2, 1]
Run Code Online (Sandbox Code Playgroud)
这可以使用itertools:
from itertools import groupby, chain
x_list = [1, 1, 2, 3, 3, 3]
gen = (range(len(list(j)), 0, -1) for _, j in groupby(x_list))
res = list(chain.from_iterable(gen))
Run Code Online (Sandbox Code Playgroud)
结果
[2, 1, 1, 3, 2, 1]
Run Code Online (Sandbox Code Playgroud)
说明
itertools.groupby对列表中的相同项目进行分组.groupby,创建一个range对象,该对象从连续项目数量的长度向后计数为1.itertools.chain从发电机到链的范围.表现说明
性能会逊色于@阿兰-伊的解决方案.虽然itertools.groupby是O(n),但它大量使用昂贵的__next__电话.这些不像在简单for循环中的迭代那样扩展.见itertools文档的groupby伪代码.
如果性能是您主要关注的问题,请坚持使用for循环.
您正在对连续组执行反向累积计数.我们可以创建一个Numpy累积计数函数
import numpy as np
def cumcount(a):
a = np.asarray(a)
b = np.append(False, a[:-1] != a[1:])
c = b.cumsum()
r = np.arange(len(a))
return r - np.append(0, np.flatnonzero(b))[c] + 1
Run Code Online (Sandbox Code Playgroud)
然后生成我们的结果
a = np.array(x_list)
cumcount(a[::-1])[::-1]
array([2, 1, 1, 3, 2, 1])
Run Code Online (Sandbox Code Playgroud)
我会使用生成器来完成这种任务,因为它可以避免逐步构建结果列表,如果需要,可以懒得使用:
def gen(iterable): # you have to think about a better name :-)
iterable = iter(iterable)
# Get the first element, in case that fails
# we can stop right now.
try:
last_seen = next(iterable)
except StopIteration:
return
count = 1
# Go through the remaining items
for item in iterable:
if item == last_seen:
count += 1
else:
# The consecutive run finished, return the
# desired values for the run and then reset
# counter and the new item for the next run.
yield from range(count, 0, -1)
count = 1
last_seen = item
# Return the result for the last run
yield from range(count, 0, -1)
Run Code Online (Sandbox Code Playgroud)
如果输入不能reversed(某些生成器/迭代器无法反转),这也将起作用:
>>> x_list = (i for i in range(10)) # it's a generator despite the variable name :-)
>>> ... arans solution ...
TypeError: 'generator' object is not reversible
>>> list(gen((i for i in range(10))))
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
Run Code Online (Sandbox Code Playgroud)
它适用于您的输入:
>>> x_list = [1, 1, 2, 3, 3, 3]
>>> list(gen(x_list))
[2, 1, 1, 3, 2, 1]
Run Code Online (Sandbox Code Playgroud)
通过使用itertools.groupby以下方法,实际上可以更简单:
import itertools
def gen(iterable):
for _, group in itertools.groupby(iterable):
length = sum(1 for _ in group) # or len(list(group))
yield from range(length, 0, -1)
>>> x_list = [1, 1, 2, 3, 3, 3]
>>> list(gen(x_list))
[2, 1, 1, 3, 2, 1]
Run Code Online (Sandbox Code Playgroud)
我也做了一些基准测试,根据这些Aran-Feys解决方案是最快的,除了piRSquareds解决方案获胜的长列表:
如果您想确认结果,这是我的基准测试设置:
from itertools import groupby, chain
import numpy as np
def gen1(iterable):
iterable = iter(iterable)
try:
last_seen = next(iterable)
except StopIteration:
return
count = 1
for item in iterable:
if item == last_seen:
count += 1
else:
yield from range(count, 0, -1)
count = 1
last_seen = item
yield from range(count, 0, -1)
def gen2(iterable):
for _, group in groupby(iterable):
length = sum(1 for _ in group)
yield from range(length, 0, -1)
def mseifert1(iterable):
return list(gen1(iterable))
def mseifert2(iterable):
return list(gen2(iterable))
def aran(x_list):
last_num = None
result = []
for num in reversed(x_list):
if num != last_num:
counter = 1
last_num = num
else:
counter += 1
result.append(counter)
return list(reversed(result))
def jpp(x_list):
gen = (range(len(list(j)), 0, -1) for _, j in groupby(x_list))
res = list(chain.from_iterable(gen))
return res
def cumcount(a):
a = np.asarray(a)
b = np.append(False, a[:-1] != a[1:])
c = b.cumsum()
r = np.arange(len(a))
return r - np.append(0, np.flatnonzero(b))[c] + 1
def pirsquared(x_list):
a = np.array(x_list)
return cumcount(a[::-1])[::-1]
from simple_benchmark import benchmark
import random
funcs = [mseifert1, mseifert2, aran, jpp, pirsquared]
args = {2**i: [random.randint(0, 5) for _ in range(2**i)] for i in range(1, 20)}
bench = benchmark(funcs, args, "list size")
%matplotlib notebook
bench.plot()
Run Code Online (Sandbox Code Playgroud)
Python 3.6.5,NumPy 1.14