在迭代部分更新的数组中重复查找 K 个最大值的索引的最快方法

Question

在迭代部分更新的数组中重复查找 K 个最大值的索引的最快方法

bpr*_*auf 5 python arrays performance numpy max

a在包含元素的复值数组中nsel = ~750000，我重复（>~10^6迭代）更新nchange < ~1000元素。每次迭代之后，在绝对平方实值数组中，我需要找到最大值b的索引（可以假设很小，当然，实际上很可能）。索引不需要排序。KKK <= ~50K <= ~10K

\n

a更新的值及其索引在每次迭代中都会发生变化，并且它们取决于对应于的最大值及其索引的（先验）未知元素b。尽管如此，让我们假设它们本质上是随机的，除了一个特定元素（通常是最大值（一个或多个））始终包含在更新值中。重要提示：更新后，新的最大值可能位于未更新的元素中。

\n

下面是一个最小的例子。为简单起见，它仅演示了 10^6（循环）迭代之一。我们可以使用(for ) 或(任意, 一般情况，请参阅/sf/answers/1661400681/K ) 找到最大值的索引。然而，由于( ) 的尺寸很大，遍历整个数组来查找最大值的索引非常慢。与大量迭代相结合，这形成了我正在使用的较大代码（非线性反卷积算法 CLEAN）的瓶颈，该代码嵌入了该步骤。b.argmax()K = 1b.argpartition()Kbnsel

\n

我已经问过如何最有效地找到最大值（大小写K = 1）的问题，请参阅Python 最有效的方法在部分更改的数组中查找最大值的索引。可接受的解决方案仅依赖于b通过将数据分割成块并（重新）计算仅更新某些元素的块的最大值来进行部分访问。> 7x从而实现了加速。

\n

根据作者@J\xc3\xa9r\xc3\xb4me Richard 的说法（感谢您的帮助！），不幸的是，这个解决方案不能轻易推广到K > 1. 正如他所建议的，一个可能的替代方案可能是二叉搜索树。现在我的

\n

问题：这样的二叉树在实践中是如何实现的，以及我们如何最有效地（如果可能的话，很容易地）找到最大值的索引？您是否有其他解决方案来以最快的方式重复查找K部分更新的数组中最大值的索引？

\n

注意：在每次迭代中，我稍后将需要b（或其副本）作为 numpy 数组。如果可能，解决方案应该主要基于 python，从 python 调用 C 或使用 Cython 或都numba可以。我目前使用python 3.7.6, numpy 1.21.2.

\n

import numpy as np\n\n# some array shapes (\'nnu_use\' and \'nm\'), number of total values (\'nvals\'), number of selected values (\'nsel\';\n# here \'nsel\' == \'nvals\'; in general \'nsel\' <= \'nvals\') and number of values to be changed (\'nchange\' << \'nsel\')\nnnu_use, nm = 10418//2 + 1, 144\nnvals = nnu_use * nm\nnsel = nvals\nnchange = 1000\n\n# number of largest peaks to be found\nK = 10\n\n# fix random seed, generate random 2D \'Fourier transform\' (\'a\', complex-valued), compute power (\'b\', real-valued),\n# and two 2D arrays for indices of axes 0 and 1\nnp.random.seed(100)\na = np.random.rand(nsel) + 1j * np.random.rand(nsel)\nb = a.real ** 2 + a.imag ** 2\ninu_2d = np.tile(np.arange(nnu_use)[:,None], (1,nm))\nim_2d = np.tile(np.arange(nm)[None,:], (nnu_use,1))\n\n# select \'nsel\' random indices and get 1D arrays of the selected 2D indices\nisel = np.random.choice(nvals, nsel, replace=False)\ninu_sel, im_sel = inu_2d.flatten()[isel], im_2d.flatten()[isel]\n\ndef do_update_iter(a, b):\n    # find index of maximum, choose \'nchange\' indices of which \'nchange - 1\' are random and the remaining one is the\n    # index of the maximum, generate random complex numbers, update \'a\' and compute updated \'b\'\n    imax = b.argmax()\n    ichange = np.concatenate(([imax],np.random.choice(nsel, nchange-1, replace=False)))\n    a_change = np.random.rand(nchange) + 1j*np.random.rand(nchange)\n    a[ichange] = a_change\n    b[ichange] = a_change.real ** 2 + a_change.imag ** 2\n    return a, b, ichange\n\n# do an update iteration on \'a\' and \'b\'\na, b, ichange = do_update_iter(a, b)\n\n# find indices of largest K values\nilarge = b.argpartition(-K)[-K:]\n

Run Code Online (Sandbox Code Playgroud)\n

Answer 1

Jér*_*ard 1

我尝试实现基于 C++ 容器的 Cython 解决方案（对于 64 位浮点值）。好消息是它比 naive 更快np.argpartition。坏消息是它相当复杂，而且速度也快不了多少：快了 3~4 倍。

一个主要问题是 Cython 没有实现std::multimap最有用的容器。可以使用类型来实现此容器std::map<Key, std::vector<Value>>，但它会使代码变得更加复杂，而且效率也较低（由于内存中存在额外的缓存不友好的间接寻址）。如果可以保证中没有重复项b，那么性能会明显更好（最高可达 x2），因为std::map可以使用来代替。此外，Cython 似乎不接受最新的 C++11/C++17/C++20 功能，使得代码读/写更加麻烦。这是可悲的，因为[一些功能如extract和右值引用]可以使代码更快。

另一个主要问题是执行时间受到缓存未命中的限制（在我的机器上>75%），因为二进制 RB 树不是缓存友好的。问题是整体数据结构很可能比 CPU 缓存大。事实上，750_000*(8*2+4) = 15_000_000 bytes至少需要存储键值，更不用说需要类似数量的内存来存储树数据结构的节点指针，并且大多数处理器缓存都小于 30 MB。这主要是更新期间由于随机访问而出现的问题：每次查找/插入都需要log2(nsel)在 RAM 中进行提取，并且 RAM 的延迟通常为几十纳秒。此外，(C++) RB 树不支持密钥更新，因此需要删除+插入。我尝试使用并行预取方法来缓解这个问题。不幸的是，在实践中它通常比较慢......

实际上，提取最大 K 个项目的速度非常快（树中 1000 个项目和 750_000 个值大约需要几微秒），而更新大约需要 1.0-1.5 毫秒。同时，大约np.argpartition需要 4.5 毫秒。

有些人报告（例如此处）std::map当项目数量很大时实际上非常慢。因此，使用另一个非标准 C++ 实现可能是个好主意。我希望在这种情况下 B 树会更快。Google Abseil库包含这样的容器，而且它们的速度肯定要快得多。话虽这么说，它肯定需要包装一些代码，这可能很乏味。或者，可以编写完整的 C++ 类并从 Cython 调用它。

执行

这是实现（以及最后的使用示例）：

最大树.pyx：

# distutils: language = c++

import numpy as np
cimport numpy as np
cimport cython

# See: https://cython.readthedocs.io/en/latest/src/userguide/wrapping_CPlusPlus.html
from libcpp.vector cimport vector
from libcpp.map cimport map
from libcpp.pair cimport pair
from cython.operator cimport dereference as deref, preincrement as inc


@cython.boundscheck(False)  # Deactivate bounds checking
@cython.wraparound(False)   # Deactivate negative indexing
cdef class MaxTree:
    cdef map[double, vector[int]] data
    cdef int itemCount

    # Build a tree from `b`
    def __init__(self, double[::1] b):
        cdef map[double, vector[int]].iterator it
        cdef pair[double, vector[int]] node
        cdef double val
        cdef int i

        # Temporary node used to ease insertion
        node.second.resize(1)

        # Iterate over `b` items so to add them in the tree
        for i in range(b.size):
            val = b[i]
            it = self.data.find(val)

            if it == self.data.end():
                # Value not found: add a new node
                node.first = val
                node.second[0] = i
                self.data.insert(node)
            else:
                # Value found: adds a new duplicate in an existing node
                deref(it).second.push_back(i)

        self.itemCount = b.size

    def size(self):
        return self.itemCount

    # Get the index (in the original `b` array) of the K-largest values
    def getKlargest(self, int count):
        cdef map[double, vector[int]].reverse_iterator rit
        cdef int vecSize
        cdef int* vecData
        cdef int i, j
        cdef int[::1] resultView

        if count > self.itemCount:
            count = self.itemCount

        result = np.empty(count, dtype=np.int32)
        resultView = result
        i = 0

        rit = self.data.rbegin()
        while rit != self.data.rend():
            vecSize = deref(rit).second.size()
            vecData = deref(rit).second.data()
            # Note: indices are not always sorted here due to the update
            for j in range(vecSize-1, -1, -1):
                resultView[i] = vecData[j]
                i += 1
                count -= 1
                if count <= 0:
                    return resultView
            inc(rit)

        return result

    # Set the values of `b` at the index `index` to `values` and update the tree accordingly
    def update(self, double[::1] b, int[::1] index, double[::1] values):
        cdef map[double, vector[int]].iterator it
        cdef pair[double, vector[int]] node
        #cdef pair[map[double, vector[int]].iterator, bool] infos
        cdef int idx, i, j, vecSize, indexSize
        cdef double oldValue, newValue
        cdef int* vecData

        assert b.size == self.itemCount
        assert index.size == values.size
        assert np.min(index) >= 0 and np.max(index) < b.size

        # Temporary node used to ease insertion
        node.second.resize(1)

        for i in range(index.size):
            idx = index[i]
            oldValue = b[idx]
            newValue = values[i]

            it = self.data.find(oldValue)
            assert it != self.data.end()

            # Update the tree
            if deref(it).second.size() == 1:
                # Remove the node from the tree and add a new one because keys are immutable
                # Assume `index` is correct/coherent and the tree is correctly updated for sake of performance
                #assert deref(it).second[0] == idx
                self.data.erase(it)
                node.first = newValue
                node.second[0] = idx
                infos = self.data.insert(node)
                inserted = infos.second
                if not inserted:
                    # Duplicate
                    it = infos.first
                    deref(it).second.push_back(idx)
            else:
                # Tricky case due to duplicates (untested)
                vecData = deref(it).second.data()
                vecSize = deref(it).second.size()
                # Search the element and remove it
                for j in range(vecSize):
                    if vecData[j] == idx:
                        vecData[j] = vecData[vecSize-1]
                        deref(it).second.pop_back()
                        break

            # Update `b`
            b[idx] = values[i]

Run Code Online (Sandbox Code Playgroud)

设置.py：

# setup.py

from setuptools import setup
from Cython.Build import cythonize

setup(ext_modules=cythonize("maxtree.pyx"))

Run Code Online (Sandbox Code Playgroud)

主要.py:

# Usage:

import numpy as np
import maxtree
np.random.seed(0)
b = np.random.rand(750_000)
nchange = 1_000
ichange = np.random.randint(0, b.size, nchange).astype(np.int32)

tree = maxtree.MaxTree(b)
tree.getKlargest(nchange)
tree.update(b, ichange, b[ichange]*0.999)

Run Code Online (Sandbox Code Playgroud)

运行命令：python3 setup.py build_ext --inplace -q

归档时间：	3 年，3 月前
查看次数：	380 次
最近记录：	3 年，3 月前