高效地计算numpy数组中的排序排列

Question

高效地计算numpy数组中的排序排列

Joe*_*ger 5 python arrays performance numpy

我有一个numpy的数组。什么是计算排序的所有排列的最快方法。

我的意思是，给定数组中的第一个元素，我想要一个依次排列的所有元素的列表。然后给定第二个元素，其后的所有元素的列表。

因此，给出我的列表：b，c和d跟随a。c＆d跟随b，而d跟随c。

x = np.array(["a", "b", "c", "d"])

Run Code Online (Sandbox Code Playgroud)

因此，潜在的输出如下所示：

[
    ["a","b"],
    ["a","c"],
    ["a","d"],

    ["b","c"],
    ["b","d"],

    ["c","d"],
]

Run Code Online (Sandbox Code Playgroud)

我将需要做几百万遍，因此我正在寻找一种有效的解决方案。

我尝试了类似的东西：

im = np.vstack([x]*len(x))
a = np.vstack(([im], [im.T])).T
results = a[np.triu_indices(len(x),1)]

Run Code Online (Sandbox Code Playgroud)

但实际上比循环慢...

Answer 1

Ash*_*ary 4

您可以使用和withitertools等函数来实现此目的。这不涉及 Python 中的循环，但仍然不是纯粹的 NumPy 解决方案：chain.from_iterablecombinationsnp.fromiter

>>> from itertools import combinations, chain
>>> arr = np.fromiter(chain.from_iterable(combinations(x, 2)), dtype=x.dtype)
>>> arr.reshape(arr.size/2, 2)
array([['a', 'b'],
       ['a', 'c'],
       ['a', 'd'],
       ..., 
       ['b', 'c'],
       ['b', 'd'],
       ['c', 'd']], 
      dtype='|S1')

Run Code Online (Sandbox Code Playgroud)

时序比较：

>>> x = np.array(["a", "b", "c", "d"]*100)
>>> %%timeit
    im = np.vstack([x]*len(x))
    a = np.vstack(([im], [im.T])).T
    results = a[np.triu_indices(len(x),1)]
... 
10 loops, best of 3: 29.2 ms per loop
>>> %%timeit
    arr = np.fromiter(chain.from_iterable(combinations(x, 2)), dtype=x.dtype)
    arr.reshape(arr.size/2, 2)
... 
100 loops, best of 3: 6.63 ms per loop

Run Code Online (Sandbox Code Playgroud)

归档时间：	10 年，11 月前
查看次数：	509 次
最近记录：	10 年，11 月前