python使用“模板列表”创建新列表

Question

python使用“模板列表”创建新列表

假设我有：

x1 = [1, 3, 2, 4]

Run Code Online (Sandbox Code Playgroud)

和：

x2 = [0, 1, 1, 0]

Run Code Online (Sandbox Code Playgroud)

具有相同的形状

现在我想“将 x2 放在 x1 的顶部”并将 x1 的所有数字与 x2 的数字相加

所以最终结果是：

end = [1+4 ,3+2]  # end[0] is the sum of all numbers of x1 where a 0 was in x2

Run Code Online (Sandbox Code Playgroud)

这是使用列表进一步澄清问题的幼稚实现

x1 = [1, 3, 2, 4]

Run Code Online (Sandbox Code Playgroud)

所以我的问题是：有没有办法在 numpy 中实现这个而不使用循环或者通常更快？

Answer 1

Pie*_*e D 7

在此特定实例（并且，在一般情况下，对unique，duplicated和groupby类型的操作），pandas是比纯更快numpy溶液：

一种pandas方法，使用Series（信用：与@mcsoini 的回答非常相似）：

def pd_group_sum(x1, x2):
    return pd.Series(x1, index=x2).groupby(x2).sum()

Run Code Online (Sandbox Code Playgroud)

一种纯粹的numpy方式，使用np.unique和一些花哨的索引：

def np_group_sum(a, groups):
    _, ix, rix = np.unique(groups, return_index=True, return_inverse=True)
    return np.where(np.arange(len(ix))[:, None] == rix, a, 0).sum(axis=1)

Run Code Online (Sandbox Code Playgroud)

注意：更好的纯粹numpy方式受到@Woodford's answer 的启发：

def selsum(a, g, e):
    return a[g==e].sum()

vselsum = np.vectorize(selsum, signature='(n),(n),()->()')

def np_group_sum2(a, groups):
    return vselsum(a, groups, np.unique(groups))

Run Code Online (Sandbox Code Playgroud)

另一种纯粹的numpy方式受到@mapf 关于使用argsort(). 这本身已经需要 45 毫秒，但我们可以尝试基于它的一些东西np.argpartition(x2, len(x2)-1)，因为在下面的基准测试中它本身只需要 7.5 毫秒：

def np_group_sum3(a, groups):
    ix = np.argpartition(groups, len(groups)-1)
    ends = np.nonzero(np.diff(np.r_[groups[ix], groups.max() + 1]))[0]
    return np.diff(np.r_[0, a[ix].cumsum()[ends]])

Run Code Online (Sandbox Code Playgroud)

（略有修改）示例

x1 = np.array([1, 3, 2, 4, 8])  # I added a group for sake of generality
x2 = np.array([0, 1, 1, 0, 7])

>>> pd_group_sum(x1, x2)
0    5
1    5
7    8

>>> np_group_sum(x1, x2)  # and all the np_group_sum() variants
array([5, 5, 8])

Run Code Online (Sandbox Code Playgroud)

速度

n = 1_000_000
x1 = np.random.randint(0, 20, n)
x2 = np.random.randint(0, 20, n)

%timeit pd_group_sum(x1, x2)
# 13.9 ms ± 65.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit np_group_sum(x1, x2)
# 171 ms ± 129 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit np_group_sum2(x1, x2)
# 66.7 ms ± 19.4 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit np_group_sum3(x1, x2)
# 25.6 ms ± 41.3 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Run Code Online (Sandbox Code Playgroud)

通过 pandas 更快，部分原因是numpy 问题 11136。

@mapf：在上面的“基准”上，仅“%timeit np.argsort(x2)”就已经花费了 43 毫秒，而“np.unique(x2)”则花费了 31 毫秒。因此，任何基于这些的方法都会输给“pandas”。但按照这个思路，我尝试了一个使用“np.argpartition()”的版本，因为这只需要 7.5 毫秒。我更新了我的答案以包含使用该方法的方法，但仍然需要 26 毫秒。 (2认同)

Answer 2

Woo*_*ord 5

>>> x1 = np.array([1, 3, 2, 7])
>>> x2 = np.array([0, 1, 1, 0])
>>> for index in np.unique(x2):
>>>     print(f'{index}: {x1[x2==index].sum()}')
0: 8
1: 5
>>> # or in one line
>>> [(index, x1[x2==index].sum()) for index in np.unique(x2)]
[(0, 8), (1, 5)]

Run Code Online (Sandbox Code Playgroud)

归档时间：	5 年，1 月前
查看次数：	136 次
最近记录：	5 年，1 月前