kcs*_*red 29 python arrays algorithm data-structures
我有一个正整数数组。例如:
[1, 7, 8, 4, 2, 1, 4]
Run Code Online (Sandbox Code Playgroud)
“归约操作”找到具有最高平均值的数组前缀,并将其删除。这里,数组前缀表示一个连续的子数组,其左端是数组的开头,例如[1]or[1, 7]或[1, 7, 8]上面。通过使用更长的前缀来打破联系。
Original array: [ 1, 7, 8, 4, 2, 1, 4]
Prefix averages: [1.0, 4.0, 5.3, 5.0, 4.4, 3.8, 3.9]
-> Delete [1, 7, 8], with maximum average 5.3
-> New array -> [4, 2, 1, 4]
Run Code Online (Sandbox Code Playgroud)
我将重复归约操作,直到数组为空:
[1, 7, 8, 4, 2, 1, 4]
^ ^
[4, 2, 1, 4]
^ ^
[2, 1, 4]
^ ^
[]
Run Code Online (Sandbox Code Playgroud)
现在,实际上不需要执行这些数组修改;我只是在寻找将由此过程删除的前缀长度列表,例如[3, 1, 3]上面的。
计算这些前缀长度的有效算法是什么?
最简单的方法是在算法的每次迭代中从头开始重新计算所有总和和平均值——O(n^2)我在下面附上了为此目的的 Python 代码。我正在寻找这种方法的任何改进 - 最理想的是下面的任何解决方案O(n^2),但是具有相同复杂性但更好的常数因子的算法也会有所帮助。
以下是我尝试过的一些方法(没有成功):
O(log n),但我还没有找到任何可以更新平均值的数据结构,因为平均值中的分母正在变化。5可能比以索引结尾的前缀具有更大的平均值6,但在删除前 3 个元素后,现在以索引结尾的前缀具有更大的平均值索引处的平均值2可能比结尾处的平均值小3。这是简单的二次方法的 Python 实现:
[1, 7, 8, 4, 2, 1, 4]
Run Code Online (Sandbox Code Playgroud)
编辑:最初发布的代码存在一个罕见的错误,即使用math.isclose()带有默认参数的 Python 进行浮点比较而不是真分数比较时出现大量输入。这已在当前代码中修复。如果您好奇的话,可以在此在线尝试链接中找到该错误的示例,以及准确解释导致此错误的原因的前言。
Mat*_*ans 34
这个问题有一个有趣的 O(n) 解决方案。
如果绘制累积和与指数的图表,则:
子数组中任意两个索引之间的平均值是图表上这些点之间的直线的斜率。
第一个最高平均前缀将在与 0 形成最大角度的点处结束。然后,下一个最高平均前缀必须具有较小的平均值,并且它将在与第一个结束处形成最大角度的点处结束。继续查看数组的末尾,我们发现......
这些最高平均值的线段正是累积和图的上凸包中的线段。
使用单调链算法找到这些段。由于点已经排序,因此需要 O(n) 时间。
# Lengths of the segments in the upper convex hull
# of the cumulative sum graph
def upperSumHullLengths(arr):
if len(arr) < 2:
if len(arr) < 1:
return []
else:
return [1]
hull = [(0, 0),(1, arr[0])]
for x in range(2, len(arr)+1):
# this has x coordinate x-1
prevPoint = hull[len(hull) - 1]
# next point in cumulative sum
point = (x, prevPoint[1] + arr[x-1])
# remove points not on the convex hull
while len(hull) >= 2:
p0 = hull[len(hull)-2]
dx0 = prevPoint[0] - p0[0]
dy0 = prevPoint[1] - p0[1]
dx1 = x - prevPoint[0]
dy1 = point[1] - prevPoint[1]
if dy1*dx0 < dy0*dx1:
break
hull.pop()
prevPoint = p0
hull.append(point)
return [hull[i+1][0] - hull[i][0] for i in range(0, len(hull)-1)]
print(upperSumHullLengths([ 1, 7, 8, 4, 2, 1, 4]))
Run Code Online (Sandbox Code Playgroud)
印刷:
[3, 1, 3]
Run Code Online (Sandbox Code Playgroud)
Matt 和 kcsquared 的解决方案以及一些基准的简化版本:
\nfrom itertools import accumulate, pairwise\n\ndef Matt_Pychoed(arr):\n hull = [(0, 0)]\n for x, y in enumerate(accumulate(arr), 1):\n while len(hull) >= 2:\n (x0, y0), (x1, y1) = hull[-2:]\n dx0 = x1 - x0\n dy0 = y1 - y0\n dx1 = x - x1\n dy1 = y - y1\n if dy1*dx0 < dy0*dx1:\n break\n hull.pop()\n hull.append((x, y))\n return [q[0] - p[0] for p, q in pairwise(hull)]\nRun Code Online (Sandbox Code Playgroud)\nfrom itertools import accumulate, count\nfrom operator import truediv\n\ndef kc_Pychoed_2(nums):\n removals = []\n while nums:\n averages = map(truediv, accumulate(nums), count(1))\n remove = max(zip(averages, count(1)))[1]\n removals.append(remove)\n nums = nums[remove:]\n return removals\nRun Code Online (Sandbox Code Playgroud)\n使用 20 个不同数组(100,000 个从 1 到 1000 的随机整数)进行基准测试:
\n min median mean max \n 65 ms 164 ms 159 ms 249 ms kc\n 38 ms 98 ms 92 ms 146 ms kc_Pychoed_1\n 58 ms 127 ms 120 ms 189 ms kc_Pychoed_2\n134 ms 137 ms 138 ms 157 ms Matt\n101 ms 102 ms 103 ms 111 ms Matt_Pychoed\nRun Code Online (Sandbox Code Playgroud)\n其中kc_Pychoed_1kcsquared\'s 但有整数running_sum且没有math.isclose。我验证所有解决方案对每个输入计算的结果都是相同的。
对于此类随机数据,kcsquared 似乎介于 O(n) 和 O(n log n) 之间。但如果数组严格递减,它会降级为二次。因为arr = [1000, 999, 998, ..., 2, 1]我得到了:
min median mean max \n102 ms 106 ms 107 ms 116 ms kc\n 60 ms 61 ms 61 ms 62 ms kc_Pychoed_1\n 76 ms 77 ms 77 ms 86 ms kc_Pychoed_2\n 0 ms 1 ms 1 ms 1 ms Matt\n 0 ms 0 ms 0 ms 0 ms Matt_Pychoed\nRun Code Online (Sandbox Code Playgroud)\n基准代码(在线尝试!):
\nfrom timeit import default_timer as timer\nfrom statistics import mean, median\nimport random\nfrom typing import List, Tuple\nimport math\nfrom itertools import accumulate, count\nfrom operator import truediv\n\ndef kc(nums: List[int]) -> List[int]:\n """Return list of lengths of max average prefix reductions."""\n\n def max_prefix_avg(arr: List[int]) -> Tuple[float, int]:\n """Return value and length of max average prefix in arr"""\n if len(arr) == 0:\n return (-math.inf, 0)\n \n best_length = 1\n best_average = -math.inf\n running_sum = 0.0\n\n for i, x in enumerate(arr, 1):\n running_sum += x\n new_average = running_sum / i\n \n if (new_average >= best_average\n or math.isclose(new_average, best_average)):\n \n best_average = new_average\n best_length = i\n\n return (best_average, best_length)\n\n removed_lengths = []\n total_removed = 0\n\n while total_removed < len(nums):\n _, new_removal = max_prefix_avg(nums[total_removed:])\n removed_lengths.append(new_removal)\n total_removed += new_removal\n\n return removed_lengths\n\ndef kc_Pychoed_1(nums: List[int]) -> List[int]:\n """Return list of lengths of max average prefix reductions."""\n\n def max_prefix_avg(arr: List[int]) -> Tuple[float, int]:\n """Return value and length of max average prefix in arr"""\n if len(arr) == 0:\n return (-math.inf, 0)\n \n best_length = 1\n best_average = -math.inf\n running_sum = 0\n\n for i, x in enumerate(arr, 1):\n running_sum += x\n new_average = running_sum / i\n \n if new_average >= best_average:\n \n best_average = new_average\n best_length = i\n\n return (best_average, best_length)\n\n removed_lengths = []\n total_removed = 0\n\n while total_removed < len(nums):\n _, new_removal = max_prefix_avg(nums[total_removed:])\n removed_lengths.append(new_removal)\n total_removed += new_removal\n\n return removed_lengths\n\ndef kc_Pychoed_2(nums):\n removals = []\n while nums:\n averages = map(truediv, accumulate(nums), count(1))\n remove = max(zip(averages, count(1)))[1]\n removals.append(remove)\n nums = nums[remove:]\n return removals\n\n# Lengths of the segments in the upper convex hull\n# of the cumulative sum graph\ndef Matt(arr):\n if len(arr) < 2:\n if len(arr) < 1:\n return []\n else:\n return [1]\n \n hull = [(0, 0),(1, arr[0])]\n for x in range(2, len(arr)+1):\n # this has x coordinate x-1\n prevPoint = hull[len(hull) - 1]\n # next point in cumulative sum\n point = (x, prevPoint[1] + arr[x-1])\n # remove points not on the convex hull\n while len(hull) >= 2:\n p0 = hull[len(hull)-2]\n dx0 = prevPoint[0] - p0[0]\n dy0 = prevPoint[1] - p0[1]\n dx1 = x - prevPoint[0]\n dy1 = point[1] - prevPoint[1]\n if dy1*dx0 < dy0*dx1:\n break\n hull.pop()\n prevPoint = p0\n hull.append(point)\n \n return [hull[i+1][0] - hull[i][0] for i in range(0, len(hull)-1)]\n\ndef pairwise(lst):\n return zip(lst, lst[1:])\n\ndef Matt_Pychoed(arr):\n hull = [(0, 0)]\n for x, y in enumerate(accumulate(arr), 1):\n while len(hull) >= 2:\n (x0, y0), (x1, y1) = hull[-2:]\n dx0 = x1 - x0\n dy0 = y1 - y0\n dx1 = x - x1\n dy1 = y - y1\n if dy1*dx0 < dy0*dx1:\n break\n hull.pop()\n hull.append((x, y))\n return [q[0] - p[0] for p, q in pairwise(hull)]\n\nfuncs = kc, kc_Pychoed_1, kc_Pychoed_2, Matt, Matt_Pychoed\nstats = min, median, mean, max\ntss = [[] for _ in funcs]\nfor r in range(1, 21):\n print(f\'After round {r}:\')\n arr = random.choices(range(1, 1001), k=100_000)\n # arr = list(range(1000, 1, -1))\n expect = None\n print(*(f\'{stat.__name__:^7}\' for stat in stats))\n for func, ts in zip(funcs, tss):\n t0 = timer()\n result = func(arr)\n t1 = timer()\n ts.append(t1 - t0)\n if expect is None:\n expect = result\n assert result == expect\n print(*(\'%3d ms \' % (stat(ts) * 1e3) for stat in stats), func.__name__)\n print()\nRun Code Online (Sandbox Code Playgroud)\n