反复去除最大平均子数组

Question

反复去除最大平均子数组

kcs*_*red 29 python arrays algorithm data-structures

我有一个正整数数组。例如：

[1, 7, 8, 4, 2, 1, 4]

Run Code Online (Sandbox Code Playgroud)

“归约操作”找到具有最高平均值的数组前缀，并将其删除。这里，数组前缀表示一个连续的子数组，其左端是数组的开头，例如[1]or[1, 7]或[1, 7, 8]上面。通过使用更长的前缀来打破联系。

Original array:  [  1,   7,   8,   4,   2,   1,   4]

Prefix averages: [1.0, 4.0, 5.3, 5.0, 4.4, 3.8, 3.9]

-> Delete [1, 7, 8], with maximum average 5.3
-> New array -> [4, 2, 1, 4]

Run Code Online (Sandbox Code Playgroud)

我将重复归约操作，直到数组为空：

[1, 7, 8, 4, 2, 1, 4]
^       ^
[4, 2, 1, 4]
^ ^
[2, 1, 4]
^       ^
[]

Run Code Online (Sandbox Code Playgroud)

现在，实际上不需要执行这些数组修改；我只是在寻找将由此过程删除的前缀长度列表，例如[3, 1, 3]上面的。

计算这些前缀长度的有效算法是什么？

最简单的方法是在算法的每次迭代中从头开始重新计算所有总和和平均值——O(n^2)我在下面附上了为此目的的 Python 代码。我正在寻找这种方法的任何改进 - 最理想的是下面的任何解决方案O(n^2)，但是具有相同复杂性但更好的常数因子的算法也会有所帮助。

以下是我尝试过的一些方法（没有成功）：

动态维护前缀和，例如使用二叉索引树。虽然我可以轻松地更新前缀和或及时找到最大前缀和O(log n)，但我还没有找到任何可以更新平均值的数据结构，因为平均值中的分母正在变化。
重用以前的前缀平均值“排名”——这些排名可能会发生变化，例如在某些数组中，以索引结尾的前缀5可能比以索引结尾的前缀具有更大的平均值6，但在删除前 3 个元素后，现在以索引结尾的前缀具有更大的平均值索引处的平均值2可能比结尾处的平均值小3。
寻找前缀结束的模式；例如，任何最大平均前缀的最右边元素始终是数组中的局部最大值，但尚不清楚这有多大帮助。

这是简单的二次方法的 Python 实现：

[1, 7, 8, 4, 2, 1, 4]

Run Code Online (Sandbox Code Playgroud)

编辑：最初发布的代码存在一个罕见的错误，即使用math.isclose()带有默认参数的 Python 进行浮点比较而不是真分数比较时出现大量输入。这已在当前代码中修复。如果您好奇的话，可以在此在线尝试链接中找到该错误的示例，以及准确解释导致此错误的原因的前言。

Answer 1

Mat*_*ans 34

这个问题有一个有趣的 O(n) 解决方案。

如果绘制累积和与指数的图表，则：

子数组中任意两个索引之间的平均值是图表上这些点之间的直线的斜率。

第一个最高平均前缀将在与 0 形成最大角度的点处结束。然后，下一个最高平均前缀必须具有较小的平均值，并且它将在与第一个结束处形成最大角度的点处结束。继续查看数组的末尾，我们发现......

这些最高平均值的线段正是累积和图的上凸包中的线段。

使用单调链算法找到这些段。由于点已经排序，因此需要 O(n) 时间。

# Lengths of the segments in the upper convex hull
# of the cumulative sum graph
def upperSumHullLengths(arr):
    if len(arr) < 2:
        if len(arr) < 1:
            return []
        else:
            return [1]
    
    hull = [(0, 0),(1, arr[0])]
    for x in range(2, len(arr)+1):
        # this has x coordinate x-1
        prevPoint = hull[len(hull) - 1]
        # next point in cumulative sum
        point = (x, prevPoint[1] + arr[x-1])
        # remove points not on the convex hull
        while len(hull) >= 2:
            p0 = hull[len(hull)-2]
            dx0 = prevPoint[0] - p0[0]
            dy0 = prevPoint[1] - p0[1]
            dx1 = x - prevPoint[0]
            dy1 = point[1] - prevPoint[1]
            if dy1*dx0 < dy0*dx1:
                break
            hull.pop()
            prevPoint = p0
        hull.append(point)
    
    return [hull[i+1][0] - hull[i][0] for i in range(0, len(hull)-1)]


print(upperSumHullLengths([  1,   7,   8,   4,   2,   1,   4]))

Run Code Online (Sandbox Code Playgroud)

印刷：

[3, 1, 3]

Run Code Online (Sandbox Code Playgroud)

很不错！链接处的动画是一张价值1000字的图片。 (3认同)
极好的！这个问题最初来自进程调度问题，因此与计算几何的联系令人惊讶且相当美观。 (3认同)
@kcsquared 不，它选择最长的前缀。在 == 情况（并列）中，它继续删除中间点而不是打破循环。 (2认同)
是的，你是对的，这段代码选择了最长的前缀；我很抱歉。事实上，我的代码是不正确的——由于浮点精度，在大输入上存在罕见的错误，现在已修复。在一个整数约为 10^7 的输入上，我的代码给出了“[221, 1]”，而您的代码正确地给出了“[222]”，我认为这是叉积方程中的错误。这个错误只有在数组长度 * 总和超过 10^9 时才会发生（因为 Python 的 `math.isclose()` 的精度为 `10^-9`），即便如此，随机输入上只有百万分之一的测试失败。 (2认同)
@kcsquared我没有把它写成叉积。这是一个分数比较，避免了除法引起的不准确：a/b < c/d <=> ad < cb。 (2认同)

Answer 2

Pyc*_*ath 8

Matt 和 kcsquared 的解决方案以及一些基准的简化版本：

\n

from itertools import accumulate, pairwise\n\ndef Matt_Pychoed(arr):\n    hull = [(0, 0)]\n    for x, y in enumerate(accumulate(arr), 1):\n        while len(hull) >= 2:\n            (x0, y0), (x1, y1) = hull[-2:]\n            dx0 = x1 - x0\n            dy0 = y1 - y0\n            dx1 = x - x1\n            dy1 = y - y1\n            if dy1*dx0 < dy0*dx1:\n                break\n            hull.pop()\n        hull.append((x, y))\n    return [q[0] - p[0] for p, q in pairwise(hull)]\n

Run Code Online (Sandbox Code Playgroud)\n

from itertools import accumulate, count\nfrom operator import truediv\n\ndef kc_Pychoed_2(nums):\n    removals = []\n    while nums:\n        averages = map(truediv, accumulate(nums), count(1))\n        remove = max(zip(averages, count(1)))[1]\n        removals.append(remove)\n        nums = nums[remove:]\n    return removals\n

Run Code Online (Sandbox Code Playgroud)\n

使用 20 个不同数组（100,000 个从 1 到 1000 的随机整数）进行基准测试：

\n

  min   median   mean     max  \n 65 ms  164 ms  159 ms  249 ms  kc\n 38 ms   98 ms   92 ms  146 ms  kc_Pychoed_1\n 58 ms  127 ms  120 ms  189 ms  kc_Pychoed_2\n134 ms  137 ms  138 ms  157 ms  Matt\n101 ms  102 ms  103 ms  111 ms  Matt_Pychoed\n

Run Code Online (Sandbox Code Playgroud)\n

其中kc_Pychoed_1kcsquared\'s 但有整数running_sum且没有math.isclose。我验证所有解决方案对每个输入计算的结果都是相同的。

\n

对于此类随机数据，kcsquared 似乎介于 O(n) 和 O(n log n) 之间。但如果数组严格递减，它会降级为二次。因为arr = [1000, 999, 998, ..., 2, 1]我得到了：

\n

  min   median   mean     max  \n102 ms  106 ms  107 ms  116 ms  kc\n 60 ms   61 ms   61 ms   62 ms  kc_Pychoed_1\n 76 ms   77 ms   77 ms   86 ms  kc_Pychoed_2\n  0 ms    1 ms    1 ms    1 ms  Matt\n  0 ms    0 ms    0 ms    0 ms  Matt_Pychoed\n

Run Code Online (Sandbox Code Playgroud)\n

基准代码（在线尝试！）：

\n

from timeit import default_timer as timer\nfrom statistics import mean, median\nimport random\nfrom typing import List, Tuple\nimport math\nfrom itertools import accumulate, count\nfrom operator import truediv\n\ndef kc(nums: List[int]) -> List[int]:\n    """Return list of lengths of max average prefix reductions."""\n\n    def max_prefix_avg(arr: List[int]) -> Tuple[float, int]:\n        """Return value and length of max average prefix in arr"""\n        if len(arr) == 0:\n            return (-math.inf, 0)\n        \n        best_length = 1\n        best_average = -math.inf\n        running_sum = 0.0\n\n        for i, x in enumerate(arr, 1):\n            running_sum += x\n            new_average = running_sum / i\n            \n            if (new_average >= best_average\n                or math.isclose(new_average, best_average)):\n                \n                best_average = new_average\n                best_length = i\n\n        return (best_average, best_length)\n\n    removed_lengths = []\n    total_removed = 0\n\n    while total_removed < len(nums):\n        _, new_removal = max_prefix_avg(nums[total_removed:])\n        removed_lengths.append(new_removal)\n        total_removed += new_removal\n\n    return removed_lengths\n\ndef kc_Pychoed_1(nums: List[int]) -> List[int]:\n    """Return list of lengths of max average prefix reductions."""\n\n    def max_prefix_avg(arr: List[int]) -> Tuple[float, int]:\n        """Return value and length of max average prefix in arr"""\n        if len(arr) == 0:\n            return (-math.inf, 0)\n        \n        best_length = 1\n        best_average = -math.inf\n        running_sum = 0\n\n        for i, x in enumerate(arr, 1):\n            running_sum += x\n            new_average = running_sum / i\n            \n            if new_average >= best_average:\n                \n                best_average = new_average\n                best_length = i\n\n        return (best_average, best_length)\n\n    removed_lengths = []\n    total_removed = 0\n\n    while total_removed < len(nums):\n        _, new_removal = max_prefix_avg(nums[total_removed:])\n        removed_lengths.append(new_removal)\n        total_removed += new_removal\n\n    return removed_lengths\n\ndef kc_Pychoed_2(nums):\n    removals = []\n    while nums:\n        averages = map(truediv, accumulate(nums), count(1))\n        remove = max(zip(averages, count(1)))[1]\n        removals.append(remove)\n        nums = nums[remove:]\n    return removals\n\n# Lengths of the segments in the upper convex hull\n# of the cumulative sum graph\ndef Matt(arr):\n    if len(arr) < 2:\n        if len(arr) < 1:\n            return []\n        else:\n            return [1]\n    \n    hull = [(0, 0),(1, arr[0])]\n    for x in range(2, len(arr)+1):\n        # this has x coordinate x-1\n        prevPoint = hull[len(hull) - 1]\n        # next point in cumulative sum\n        point = (x, prevPoint[1] + arr[x-1])\n        # remove points not on the convex hull\n        while len(hull) >= 2:\n            p0 = hull[len(hull)-2]\n            dx0 = prevPoint[0] - p0[0]\n            dy0 = prevPoint[1] - p0[1]\n            dx1 = x - prevPoint[0]\n            dy1 = point[1] - prevPoint[1]\n            if dy1*dx0 < dy0*dx1:\n                break\n            hull.pop()\n            prevPoint = p0\n        hull.append(point)\n    \n    return [hull[i+1][0] - hull[i][0] for i in range(0, len(hull)-1)]\n\ndef pairwise(lst):\n    return zip(lst, lst[1:])\n\ndef Matt_Pychoed(arr):\n    hull = [(0, 0)]\n    for x, y in enumerate(accumulate(arr), 1):\n        while len(hull) >= 2:\n            (x0, y0), (x1, y1) = hull[-2:]\n            dx0 = x1 - x0\n            dy0 = y1 - y0\n            dx1 = x - x1\n            dy1 = y - y1\n            if dy1*dx0 < dy0*dx1:\n                break\n            hull.pop()\n        hull.append((x, y))\n    return [q[0] - p[0] for p, q in pairwise(hull)]\n\nfuncs = kc, kc_Pychoed_1, kc_Pychoed_2, Matt, Matt_Pychoed\nstats = min, median, mean, max\ntss = [[] for _ in funcs]\nfor r in range(1, 21):\n    print(f\'After round {r}:\')\n    arr = random.choices(range(1, 1001), k=100_000)\n    # arr = list(range(1000, 1, -1))\n    expect = None\n    print(*(f\'{stat.__name__:^7}\' for stat in stats))\n    for func, ts in zip(funcs, tss):\n        t0 = timer()\n        result = func(arr)\n        t1 = timer()\n        ts.append(t1 - t0)\n        if expect is None:\n            expect = result\n        assert result == expect\n        print(*(\'%3d ms \' % (stat(ts) * 1e3) for stat in stats), func.__name__)\n    print()\n

Run Code Online (Sandbox Code Playgroud)\n

归档时间：	3 年，11 月前
查看次数：	1790 次
最近记录：	3 年，11 月前