Python - 删除重叠列表

use*_*068 13 python algorithm list

假设我有一个包含索引的列表列表[[start, end], [start1, end1], [start2, end2]].

例如:

[[0, 133], [78, 100], [25, 30]].

如何检查列表之间的重叠并删除每次更长的列表?所以:

>>> list = [[0, 133], [78, 100], [25, 30]]
>>> foo(list)
[[78, 100], [25, 30]]
Run Code Online (Sandbox Code Playgroud)

这是我到目前为止尝试做的事情:

def cleanup_list(list):
    i = 0
    c = 0
    x = list[:]
    end = len(x)
    while i < end-1:
        for n in range(x[i][0], x[i][1]):
            if n in range(x[i+1][0], x[i+1][1]):
                list.remove(max(x[i], x[i+1]))
        i +=1
    return list
Run Code Online (Sandbox Code Playgroud)

但除了令人费解之外它还没有正常工作:

 >>>cleanup_list([[0,100],[9,10],[12,90]])
 [[0, 100], [12, 90]]
Run Code Online (Sandbox Code Playgroud)

任何帮助,将不胜感激!

编辑:

其他例子是:

>>>a = [[0, 100], [4, 20], [30, 35], [30, 78]]
>>>foo(a)
[[4, 20], [30, 35]]

>>>b = [[30, 70], [25, 40]]
>>>foo(b)
[[25, 40]]
Run Code Online (Sandbox Code Playgroud)

我基本上试图删除与另一个列表重叠的所有最长列表.在这种情况下,我不必担心列表长度相等.

谢谢!!

jfs*_*jfs 10

要从列表中删除最少数量的间隔,使得剩余的间隔不重叠,O(n*log n)算法存在:

def maximize_nonoverlapping_count(intervals):
    # sort by the end-point
    L = sorted(intervals, key=lambda (start, end): (end, (end - start)),
               reverse=True) # O(n*logn)
    iv = build_interval_tree(intervals) # O(n*log n)
    result = []
    while L: # until there are intervals left to consider
        # pop the interval with the smallest end-point, keep it in the result
        result.append(L.pop()) # O(1)
        # remove intervals that overlap with the popped interval
        overlapping_intervals = iv.pop(result[-1]) # O(log n + m)
        remove(overlapping_intervals, from_=L) 
    return result
Run Code Online (Sandbox Code Playgroud)

它应该产生以下结果:

f = maximize_nonoverlapping_count
assert f([[0, 133], [78, 100], [25, 30]]) == [[25, 30], [78, 100]]
assert f([[0,100],[9,10],[12,90]]) == [[9,10], [12, 90]]
assert f([[0, 100], [4, 20], [30, 35], [30, 78]]) == [[4, 20], [30, 35]]
assert f([[30, 70], [25, 40]]) == [[25, 40]]
Run Code Online (Sandbox Code Playgroud)

它需要能够及时发现O(log n + m)与给定间隔重叠的所有间隔的数据结构,例如,IntervalTree.有些实现可以从Python中使用,例如quicksect.py,请参阅快速区间交叉方法以获取示例用法.


这是quicksect基于a 的O(n**2)上述算法的实现:

from quicksect import IntervalNode

class Interval(object):
    def __init__(self, start, end):
        self.start = start
        self.end = end
        self.removed = False

def maximize_nonoverlapping_count(intervals):
    intervals = [Interval(start, end) for start, end in intervals]
    # sort by the end-point
    intervals.sort(key=lambda x: (x.end, (x.end - x.start)))   # O(n*log n)
    tree = build_interval_tree(intervals) # O(n*log n)
    result = []
    for smallest in intervals: # O(n) (without the loop body)
        # pop the interval with the smallest end-point, keep it in the result
        if smallest.removed:
            continue # skip removed nodes
        smallest.removed = True
        result.append([smallest.start, smallest.end]) # O(1)

        # remove (mark) intervals that overlap with the popped interval
        tree.intersect(smallest.start, smallest.end, # O(log n + m)
                       lambda x: setattr(x.other, 'removed', True))
    return result

def build_interval_tree(intervals):
    root = IntervalNode(intervals[0].start, intervals[0].end,
                        other=intervals[0])
    return reduce(lambda tree, x: tree.insert(x.start, x.end, other=x),
                  intervals[1:], root)
Run Code Online (Sandbox Code Playgroud)

注:在最坏情况下的时间复杂度是O(n**2)此实现,因为间隔仅标记为删除例如,想象这样的输入intervalslen(result) == len(intervals) / 3和有len(intervals) / 2跨越整个范围内,那么间隔tree.intersect()将被称为n/3次,每次通话将执行x.other.removed = True至少n/2倍,即,n*n/6总体操作:

n = 6
intervals = [[0, 100], [0, 100], [0, 100], [0, 10], [10, 20], [15, 40]])
result = [[0, 10], [10, 20]]
Run Code Online (Sandbox Code Playgroud)

这是一个banyan基于O(n log n)实现的实现:

from banyan import SortedSet, OverlappingIntervalsUpdator # pip install banyan

def maximize_nonoverlapping_count(intervals):
    # sort by the end-point O(n log n)
    sorted_intervals = SortedSet(intervals,
                                 key=lambda (start, end): (end, (end - start)))
    # build "interval" tree O(n log n)
    tree = SortedSet(intervals, updator=OverlappingIntervalsUpdator)
    result = []
    while sorted_intervals: # until there are intervals left to consider
        # pop the interval with the smallest end-point, keep it in the result
        result.append(sorted_intervals.pop()) # O(log n)

        # remove intervals that overlap with the popped interval
        overlapping_intervals = tree.overlap(result[-1]) # O(m log n)
        tree -= overlapping_intervals # O(m log n)
        sorted_intervals -= overlapping_intervals # O(m log n)
    return result
Run Code Online (Sandbox Code Playgroud)

注意:此实现考虑[0, 10][10, 20]间隔重叠:

f = maximize_nonoverlapping_count
assert f([[0, 100], [0, 10], [11, 20], [15, 40]]) == [[0, 10] ,[11, 20]]
assert f([[0, 100], [0, 10], [10, 20], [15, 40]]) == [[0, 10] ,[15, 40]]
Run Code Online (Sandbox Code Playgroud)

sorted_intervals并且tree可以合并:

from banyan import SortedSet, OverlappingIntervalsUpdator # pip install banyan

def maximize_nonoverlapping_count(intervals):
    # build "interval" tree sorted by the end-point O(n log n)
    tree = SortedSet(intervals, key=lambda (start, end): (end, (end - start)),
                     updator=OverlappingIntervalsUpdator)
    result = []
    while tree: # until there are intervals left to consider
        # pop the interval with the smallest end-point, keep it in the result
        result.append(tree.pop()) # O(log n)

        # remove intervals that overlap with the popped interval
        overlapping_intervals = tree.overlap(result[-1]) # O(m log n)
        tree -= overlapping_intervals # O(m log n)
    return result
Run Code Online (Sandbox Code Playgroud)