如何将一组重叠范围划分为非重叠范围?

Ron*_*son 10 python algorithm math range rectangles

假设你有一组范围:

  • 0 - 100:'a'
  • 0 - 75:'b'
  • 95 - 150:'c'
  • 120 - 130:'d'

显然,这些范围在某些点重叠.您如何剖析这些范围以生成非重叠范围列表,同时保留与其原始范围相关的信息(在这种情况下,范围后面的字母)?

例如,运行算法后的上述结果将是:

  • 0 - 75:'a','b'
  • 76 - 94:'a'
  • 95 - 100:'a','c'
  • 101 - 119:'c'
  • 120 - 130:'c','d'
  • 131 - 150:'c'

Edm*_*und 12

在编写混合(部分重叠)音频样本的程序时,我遇到了同样的问题.

我所做的是将"开始事件"和"停止事件"(对于每个项目)添加到列表中,按时间点对列表进行排序,然后按顺序处理它.您可以这样做,除了使用整数点而不是时间,而不是混合声音,您将添加符号到对应于范围的集合.无论您是生成空范围还是省略它们都是可选的.

Edit 也许有些代码......

# input = list of (start, stop, symbol) tuples
points = [] # list of (offset, plus/minus, symbol) tuples
for start,stop,symbol in input:
    points.append((start,'+',symbol))
    points.append((stop,'-',symbol))
points.sort()

ranges = [] # output list of (start, stop, symbol_set) tuples
current_set = set()
last_start = None
for offset,pm,symbol in points:
    if pm == '+':
         if last_start is not None:
             #TODO avoid outputting empty or trivial ranges
             ranges.append((last_start,offset-1,current_set))
         current_set.add(symbol)
         last_start = offset
    elif pm == '-':
         # Getting a minus without a last_start is unpossible here, so not handled
         ranges.append((last_start,offset-1,current_set))
         current_set.remove(symbol)
         last_start = offset

# Finish off
if last_start is not None:
    ranges.append((last_start,offset-1,current_set))
Run Code Online (Sandbox Code Playgroud)

显然,完全未经测试.


sou*_*ica 7

与 Edmunds 类似的答案经过测试,包括对 (1,1) 等区间的支持:

class MultiSet(object):
    def __init__(self, intervals):
        self.intervals = intervals
        self.events = None

    def split_ranges(self):
        self.events = []
        for start, stop, symbol in self.intervals:
            self.events.append((start, True, stop, symbol))
            self.events.append((stop, False, start, symbol))

        def event_key(event):
            key_endpoint, key_is_start, key_other, _ = event
            key_order = 0 if key_is_start else 1
            return key_endpoint, key_order, key_other

        self.events.sort(key=event_key)

        current_set = set()
        ranges = []
        current_start = -1

        for endpoint, is_start, other, symbol in self.events:
            if is_start:
                if current_start != -1 and endpoint != current_start and \
                       endpoint - 1 >= current_start and current_set:
                    ranges.append((current_start, endpoint - 1, current_set.copy()))
                current_start = endpoint
                current_set.add(symbol)
            else:
                if current_start != -1 and endpoint >= current_start and current_set:
                    ranges.append((current_start, endpoint, current_set.copy()))
                current_set.remove(symbol)
                current_start = endpoint + 1

        return ranges


if __name__ == '__main__':
    intervals = [
        (0, 100, 'a'), (0, 75, 'b'), (75, 80, 'd'), (95, 150, 'c'), 
        (120, 130, 'd'), (160, 175, 'e'), (165, 180, 'a')
    ]
    multiset = MultiSet(intervals)
    pprint.pprint(multiset.split_ranges())


[(0, 74, {'b', 'a'}),
 (75, 75, {'d', 'b', 'a'}),
 (76, 80, {'d', 'a'}),
 (81, 94, {'a'}),
 (95, 100, {'c', 'a'}),
 (101, 119, {'c'}),
 (120, 130, {'d', 'c'}),
 (131, 150, {'c'}),
 (160, 164, {'e'}),
 (165, 175, {'e', 'a'}),
 (176, 180, {'a'})]
Run Code Online (Sandbox Code Playgroud)