使用提示将已排序的范围插入到std :: set中

Question

使用提示将已排序的范围插入到std :: set中

假设我有一个std::set(根据定义排序),并且我有另一系列的排序元素(为了简单起见,在另一个std::set对象中).此外,我保证第二组中的所有值都大于第一组中的所有值.

我知道我可以有效地插入一个元素std::set- 如果我传递一个正确的hint,这将是O(1).我知道我可以插入任何范围std::set,但是当没有hint通过时,这将是O(k logN)(其中k是新元素的数量,N个旧元素的数量).

我可以在a中插入一个范围std::set并提供一个hint？我能想到的唯一方法是使用a进行单个插入hint,这确实将我的情况下插入操作的复杂性降低到O(k):

std::set <int> bigSet{1,2,5,7,10,15,18};
std::set <int> biggerSet{50,60,70};  

for(auto bigElem : biggerSet)
    bigSet.insert(bigSet.end(), bigElem);

Run Code Online (Sandbox Code Playgroud)

Answer 1

Jer*_*fin 2

首先，要进行您正在讨论的合并，您可能想要使用set(ormap的)merge成员函数，这将允许您将一些现有的合并map到这个函数中。这样做的优点（以及您可能不希望这样做的原因，具体取决于您的使用模式）是要合并的项目实际上从一组移动到另一组，因此您不必分配新节点（这可以节省相当多的时间）。缺点是节点随后会从源集中消失，因此如果您需要每个局部直方图在合并到全局直方图中后保持完整，您不希望这样做。

在搜索排序向量时，通常可以比 O(log N) 做得更好。假设分布可合理预测，您可以使用插值搜索（通常）在 O(log log N) 左右进行搜索，通常称为“伪常数”复杂度。

鉴于您只相对不频繁地进行插入，您也可以考虑混合结构。这从一小块未排序的数据开始。当达到其大小的上限时，对其进行排序并将其插入到已排序的向量中。然后您返回将项目添加到未排序的区域。当达到限制时，再次排序并将其与现有排序数据合并。

假设你将未排序的 chunk 限制为不大于 log(N)，搜索复杂度仍然是 O(log N)——对已排序的 chunk 进行一次 log(n) 二分搜索或 log log N 插值搜索，以及一次 log(n)对未排序块的线性搜索。一旦您确认某个项目尚不存在，添加它就具有恒定的复杂性（只需将其添加到未排序块的末尾）。最大的优点是，这仍然可以轻松地使用连续的结构，例如向量，因此它比典型的树结构对缓存更友好。

Since your global histogram is (apparently) only ever populated with data coming from the local histograms, it might be worth considering just keeping it in a vector, and when you need to merge in the data from one of the local chunks, just use std::merge to take the existing global histogram and the local histogram, and merge them together into a new global histogram. This has O(N + M) complexity (N = size of global histogram, M = size of local histogram). Depending on the typical size of a local histogram, this could pretty easily work out as a win.

归档时间：	7 年，7 月前
查看次数：	485 次
最近记录：	7 年，7 月前