如何对齐两个数字列表

Question

如何对齐两个数字列表

我有两个排序号码清单A，并B用B至少是只要A。说：

A = [1.1, 2.3, 5.6, 5.7, 10.1]
B = [0, 1.9, 2.4, 2.7, 8.4, 9.1, 10.7, 11.8]

Run Code Online (Sandbox Code Playgroud)

我想按保留顺序将每个数字A与另一个数字相关联B。对于任何此类映射，我们将总距离定义为映射数字之间的平方距离之和。

例如：

如果我们将1.1映射到0 0，那么2.3可以映射到1.9以后的任何数字。但是，如果我们已将1.1映射到2.7，则2.3只能从8.4映射到B中的数字。

假设我们映射1.1-> 0、2.3-> 1.9、5.6-> 8.4、5.7-> 9.1、10.1-> 10.7。这是有效的映射，并且具有距离（1.1 ^ 2 + 0.4 ^ 2 + 2.8 ^ 2 + 3.4 ^ 2 + 0.6 ^ 2）。

另一个显示贪婪方法的示例将不起作用：

 A = [1, 2]
 B = [0, 1, 10000]

Run Code Online (Sandbox Code Playgroud)

如果我们映射1-> 1，那么我们必须映射2-> 10000，这很糟糕。

任务是找到具有最小总距离的有效映射。

很难吗？我对列表数不胜数的快速方法感兴趣。

Answer 1

bti*_*lly 5

这是一个O(n)解决方案！（这是原始尝试，请参见下面的固定版本。）

这个想法如下。我们首先解决所有其他元素的问题，将其变成一个非常接近的解决方案，然后使用动态编程找到真正的解决方案。这样解决的问题是尺寸缩小一半，然后是O(n)工作。利用事实证明x + x/2 + x/4 + ... = 2x这是O(n)可行的。

这非常非常需要排序列表。而且，跨度为5的乐队太过分了，看起来跨度为3的乐队总是可以给出正确的答案，但是我对此没有足够的信心。

def improve_matching (list1, list2, matching):
    # We do DP forward, trying a band that is 5 across, building up our
    # answer as a linked list.  If our answer changed by no more than 1
    # anywhere, we are done.  Else we recursively improve again.
    best_j_last = -1
    last = {-1: (0.0, None)}
    for i in range(len(list1)):
        best_j = None
        best_cost = None
        this = {}
        for delta in (-2, 2, -1, 1, 0):
            j = matching[i] + delta
            # Bounds sanity checks.
            if j < 0:
                continue
            elif len(list2) <= j:
                continue

            j_prev = best_j_last
            if j <= j_prev:
                if j-1 in last:
                    j_prev = j-1
                else:
                    # Can't push back this far.
                    continue

            cost = last[j_prev][0] + (list1[i] - list2[j])**2
            this[j] = (cost, [j, last[j_prev][1]])
            if (best_j is None) or cost <= best_cost:
                best_j = j
                best_cost = cost

        best_j_last = best_j
        last = this

    (final_cost, linked_list) = last[best_j_last]
    matching_rev = []
    while linked_list is not None:
        matching_rev.append( linked_list[0])
        linked_list = linked_list[1]
    matching_new = [x for x in reversed(matching_rev)]
    for i in range(len(matching_new)):
        if 1 < abs(matching[i] - matching_new[i]):
            print "Improving further" # Does this ever happen?
            return improve_matching(list1, list2, matching_new)

    return matching_new

def match_lists (list1, list2):
    if 0 == len(list1):
        return []
    elif 1 == len(list1):
        best_j = 0
        best_cost = (list1[0] - list2[0])**2
        for j in range(1, len(list2)):
            cost = (list1[0] - list2[j])**2
            if cost < best_cost:
                best_cost = cost
                best_j = j
        return [best_j]
    elif 1 < len(list1):
        # Solve a smaller problem first.
        list1_smaller = [list1[2*i] for i in range((len(list1)+1)//2)]
        list2_smaller = [list2[2*i] for i in range((len(list2)+1)//2)]
        matching_smaller = match_lists(list1_smaller, list2_smaller)

        # Start with that matching.
        matching = [None] * len(list1)
        for i in range(len(matching_smaller)):
            matching[2*i] = 2*matching_smaller[i]

        # Fill in the holes between
        for i in range(len(matching) - 1):
            if matching[i] is None:
                best_j = matching[i-1] + 1
                best_cost = (list1[i] - list2[best_j])**2
                for j in range(best_j+1, matching[i+1]):
                    cost = (list1[i] - list2[j])**2
                    if cost < best_cost:
                        best_cost = cost
                        best_j = j
                matching[i] = best_j

        # And fill in the last one if needed
        if matching[-1] is None:
            if matching[-2] + 1 == len(list2):
                # This will be an invalid matching, but improve will fix that.
                matching[-1] = matching[-2]
            else:
                best_j = matching[-2] + 1
                best_cost = (list1[-2] - list2[best_j])**2
                for j in range(best_j+1, len(list2)):
                    cost = (list1[-1] - list2[j])**2
                    if cost < best_cost:
                        best_cost = cost
                        best_j = j
                matching[-1] = best_j

        # And now improve.
        return improve_matching(list1, list2, matching)

def best_matching (list1, list2):
    matching = match_lists(list1, list2)
    cost = 0.0
    result = []
    for i in range(len(matching)):
        pair = (list1[i], list2[matching[i]])
        result.append(pair)
        cost = cost + (pair[0] - pair[1])**2
    return (cost, result)

Run Code Online (Sandbox Code Playgroud)

更新

上面有一个错误。可以用证明match_lists([1, 3], [0, 0, 0, 0, 0, 1, 3])。但是下面的解决方案也是O(n)，我相信没有错误。区别在于，我不是在寻找固定宽度的带子，而是在寻找由先前匹配动态确定的宽度的带子。由于在任何给定位置上查找的条目都不超过5个，因此O(n)该数组和几何递归递归调用再次结束。但是，长时间延伸相同的值不会引起问题。

def match_lists (list1, list2):
    prev_matching = []

    if 0 == len(list1):
        # Trivial match
        return prev_matching
    elif 1 < len(list1):
        # Solve a smaller problem first.
        list1_smaller = [list1[2*i] for i in range((len(list1)+1)//2)]
        list2_smaller = [list2[2*i] for i in range((len(list2)+1)//2)]
        prev_matching = match_lists(list1_smaller, list2_smaller)

    best_j_last = -1
    last = {-1: (0.0, None)}
    for i in range(len(list1)):
        lowest_j = 0
        highest_j = len(list2) - 1
        if 3 < i:
            lowest_j = 2 * prev_matching[i//2 - 2]
        if i + 4 < len(list1):
            highest_j = 2 * prev_matching[i//2 + 2]

        if best_j_last == highest_j:
            # Have to push it back.
            best_j_last = best_j_last - 1

        best_cost = last[best_j_last][0] + (list1[i] - list2[highest_j])**2
        best_j = highest_j
        this = {best_j: (best_cost, [best_j, last[best_j_last][1]])}

        # Now try the others.
        for j in range(lowest_j, highest_j):
            prev_j = best_j_last
            if j <= prev_j:
                prev_j = j - 1

            if prev_j not in last:
                continue
            else:
                cost = last[prev_j][0] + (list1[i] - list2[j])**2
                this[j] = (cost, [j, last[prev_j][1]])
                if cost < best_cost:
                    best_cost = cost
                    best_j = j

        last = this
        best_j_last = best_j

    (final_cost, linked_list) = last[best_j_last]
    matching_rev = []
    while linked_list is not None:
        matching_rev.append( linked_list[0])
        linked_list = linked_list[1]
    matching_new = [x for x in reversed(matching_rev)]

    return matching_new

def best_matching (list1, list2):
    matching = match_lists(list1, list2)
    cost = 0.0
    result = []
    for i in range(len(matching)):
        pair = (list1[i], list2[matching[i]])
        result.append(pair)
        cost = cost + (pair[0] - pair[1])**2
    return (cost, result)

Run Code Online (Sandbox Code Playgroud)

注意

我被要求解释为什么这行得通。

这是我的启发式理解。在算法中，我们解决了半问题。然后，我们必须解决整个问题。

问题是，对于一个完整的问题，最佳解决方案可以从最佳解决方案强制到一半问题吗？我们通过使list2不在一半问题中的每个元素都尽可能大来将其向右推，并且使不在一半问题中的每个元素都list1尽可能小。但是，如果我们将半个问题中的那些推到右边，然后将重复元素放在它们的模边界效应处，那么我们就得到了半个问题的2个最优解，没有什么比下一个元素更正确地移动了有一半的问题。类似的推理适用于试图迫使解决方案离开。

现在让我们讨论这些边界效应。这些边界效果以1个元素结尾。因此，当我们尝试将元素推到最后时，我们不可能总是如此。通过查看2个元素而不是1个元素，我们添加了足够的摆动空间来说明这一点。

因此，必须有一个最佳解决方案，该解决方案非常接近以明显方式翻倍的一半问题。可能还有其他，但至少有一个。DP步骤将找到它。

我需要做一些工作才能将这种直觉转化为正式的证明，但我相信可以做到。

归档时间：	6 年，7 月前
查看次数：	531 次
最近记录：	6 年，7 月前