控制改组的距离

ins*_*get 18 python algorithm shuffle constraints python-3.x

我之前试过问这个问题,但从来没有能够正确地说出来.我希望这次我能说得对:

我有一个独特的元素列表.我想将这个列表改组以生成一个新列表.但是,我想约束shuffle,这样每个元素的新位置最多d远离其在列表中的原始位置.

例如:

L = [1,2,3,4]
d = 2
answer = magicFunction(L, d)
Run Code Online (Sandbox Code Playgroud)

现在,一个可能的结果可能是:

>>> print(answer)
[3,1,2,4]
Run Code Online (Sandbox Code Playgroud)

请注意,3已移动了两个索引,1并且2已移动了一个索引,并且4根本没有移动.因此,根据我之前的定义,这是一个有效的洗牌.以下代码片段可用于验证:

old = {e:i for i,e in enumerate(L)}
new = {e:i for i,e in enumerate(answer)}
valid = all(abs(i-new[e])<=d for e,i in old.items())
Run Code Online (Sandbox Code Playgroud)

现在,我可以很容易地生成所有可能的排列L,过滤有效的排列,并随机选择一个.但这似乎并不优雅.有没有人对如何做到这一点有任何其他想法?

use*_*ica 5

这将是漫长而干燥的.

我有一个产生均匀分布的解决方案.它需要O(len(L) * d**d)时间和空间进行预计算,然后在O(len(L)*d)时间1中执行随机播放.如果不需要均匀分布,则不需要预计算,并且O(len(L))由于更快的随机选择,可以减少洗牌时间; 我没有实现非均匀分布.这个算法的两个步骤都比蛮力快得多,但它们仍然没有我想要的那么好.此外,虽然这个概念应该有效,但我没有像我想的那样彻底地测试我的实现.

假设我们L从前面迭代,为每个元素选择一个位置.将滞后定义为下一个要放置的元素与第一个未填充位置之间的距离.每次我们放置一个元素时,滞后最多增加一个,因为下一个元素的索引现在高一个,但第一个未填充位置的索引不能变低.

每当滞后时d,我们就被迫将下一个元素放在第一个未填充的位置,即使在距离之内可能还有其他空白点d.如果我们这样做,滞后不会超出d,我们将总是有一个位置来放置每个元素,我们将生成列表的有效随机播放.因此,我们对如何生成shuffle有一个大概的了解; 但是,如果我们随机统一做出选择,整体分布就不一致.例如,使用len(L) == 3d == 1,有3种可能的混洗(一种用于中间元素的每个位置),但如果我们统一选择第一个元素的位置,则一次混洗的可能性是其他任何一种的两倍.

如果我们想要在有效shuffle上进行均匀分布,我们需要对每个元素的位置进行加权随机选择,其中如果我们选择该位置,则位置的权重基于可能的shuffle的数量.天真地做,这将要求我们生成所有可能的洗牌来计算它们,这需要O(d**len(L))时间.然而,该算法的任何步骤之后剩余的可能洗牌的数量只取决于什么样的顺序,他们填写.对于填充或填充斑点的任何图案的斑点,我们已经充满,不,可能洗牌的数量的总和每个可能放置下一个元素的可能shuffle的数量.在任何步骤中,最多d可能放置下一个元素的位置,并且存在O(d**d)未填充点的可能模式(因为除了d当前元素之后的任何点必须是满的,并且任何点d或更远的前面必须是空的).我们可以使用它来生成一个大小的马尔可夫链O(len(L) * d**d),花O(len(L) * d**d)时间这样做,然后使用这个马尔可夫链来及时进行改组O(len(L)*d).

示例代码(目前不是O(len(L)*d)由于效率低下的马尔可夫链表示):

import random

# states are (k, filled_spots) tuples, where k is the index of the next
# element to place, and filled_spots is a tuple of booleans
# of length 2*d, representing whether each index from k-d to
# k+d-1 has an element in it. We pretend indices outside the array are
# full, for ease of representation.

def _successors(n, d, state):
    '''Yield all legal next filled_spots and the move that takes you there.

    Doesn't handle k=n.'''
    k, filled_spots = state
    next_k = k+1

    # If k+d is a valid index, this represents the empty spot there.
    possible_next_spot = (False,) if k + d < n else (True,)

    if not filled_spots[0]:
        # Must use that position.
        yield k-d, filled_spots[1:] + possible_next_spot
    else:
        # Can fill any empty spot within a distance d.
        shifted_filled_spots = list(filled_spots[1:] + possible_next_spot)
        for i, filled in enumerate(shifted_filled_spots):
            if not filled:
                successor_state = shifted_filled_spots[:]
                successor_state[i] = True
                yield next_k-d+i, tuple(successor_state)
                # next_k instead of k in that index computation, because
                # i is indexing relative to shifted_filled_spots instead
                # of filled_spots


def _markov_chain(n, d):
    '''Precompute a table of weights for generating shuffles.

    _markov_chain(n, d) produces a table that can be fed to
    _distance_limited_shuffle to permute lists of length n in such a way that
    no list element moves a distance of more than d from its initial spot,
    and all permutations satisfying this condition are equally likely.

    This is expensive.

    '''

    if d >= n - 1:
        # We don't need the table, and generating a table for d >= n
        # complicates the indexing a bit. It's too complicated already.
        return None

    table = {}
    termination_state = (n, (d*2 * (True,)))
    table[termination_state] = 1

    def possible_shuffles(state):
        try:
            return table[state]
        except KeyError:
            k, _ = state
            count = table[state] = sum(
                    possible_shuffles((k+1, next_filled_spots))
                    for (_, next_filled_spots) in _successors(n, d, state)
            )
            return count

    initial_state = (0, (d*(True,) + d*(False,)))
    possible_shuffles(initial_state)
    return table

def _distance_limited_shuffle(l, d, table):
    # Generate an index into the set of all permutations, then use the
    # markov chain to efficiently find which permutation we picked.

    n = len(l)

    if d >= n - 1:
        random.shuffle(l)
        return

    permutation = [None]*n
    state = (0, (d*(True,) + d*(False,)))
    permutations_to_skip = random.randrange(table[state])

    for i, item in enumerate(l):
        for placement_index, new_filled_spots in _successors(n, d, state):
            new_state = (i+1, new_filled_spots)
            if table[new_state] <= permutations_to_skip:
                permutations_to_skip -= table[new_state]
            else:
                state = new_state
                permutation[placement_index] = item
                break
    return permutation

class Shuffler(object):
    def __init__(self, n, d):
        self.n = n
        self.d = d
        self.table = _markov_chain(n, d)
    def shuffled(self, l):
        if len(l) != self.n:
            raise ValueError('Wrong input size')
        return _distance_limited_shuffle(l, self.d, self.table)
    __call__ = shuffled
Run Code Online (Sandbox Code Playgroud)

1我们可以使用基于树的加权随机选择算法来改善随机播放时间O(len(L)*log(d)),但由于表格变得如此巨大,即使是中等大小d,这似乎也不值得.此外,d**d界限中的因素是过高估计,但实际因素仍然至少是指数.


kof*_*ein 5

简而言之,应该打乱的列表按索引和随机数之和排序。

import random
xs = range(20) # list that should be shuffled
d = 5          # distance
[x for i,x in sorted(enumerate(xs), key= lambda (i,x): i+(d+1)*random.random())]
Run Code Online (Sandbox Code Playgroud)

出去:

[1, 4, 3, 0, 2, 6, 7, 5, 8, 9, 10, 11, 12, 14, 13, 15, 19, 16, 18, 17]
Run Code Online (Sandbox Code Playgroud)

基本上就是这样。但这看起来有点令人难以承受,因此......

算法更详细

为了更好地理解这一点,请考虑普通随机洗牌的替代实现:

import random
sorted(range(10), key = lambda x: random.random())
Run Code Online (Sandbox Code Playgroud)

出去:

[2, 6, 5, 0, 9, 1, 3, 8, 7, 4]
Run Code Online (Sandbox Code Playgroud)

为了限制距离,我们必须实现一个依赖于元素索引的替代排序键函数。该函数sort_criterion负责此操作。

import random

def exclusive_uniform(a, b):
    "returns a random value in the interval  [a, b)"
    return a+(b-a)*random.random()

def distance_constrained_shuffle(sequence, distance,
                                 randmoveforward = exclusive_uniform):
    def sort_criterion(enumerate_tuple):
        """
        returns the index plus a random offset,
        such that the result can overtake at most 'distance' elements
        """
        indx, value = enumerate_tuple
        return indx + randmoveforward(0, distance+1)

    # get enumerated, shuffled list
    enumerated_result = sorted(enumerate(sequence), key = sort_criterion)
    # remove enumeration
    result = [x for i, x in enumerated_result]
    return result
Run Code Online (Sandbox Code Playgroud)

通过参数,randmoveforward您可以传递具有不同概率密度函数 (pdf) 的随机数生成器来修改距离分布。

剩下的就是距离分布的测试和评估。


测试功能

这是测试功能的实现。该validate函数实际上取自OP,但出于性能原因,我删除了其中一本字典的创建。

def test(num_cases = 10, distance = 3, sequence = range(1000)):
    def validate(d, lst, answer):
        #old = {e:i for i,e in enumerate(lst)}
        new = {e:i for i,e in enumerate(answer)}
        return all(abs(i-new[e])<=d for i,e in enumerate(lst))
        #return all(abs(i-new[e])<=d for e,i in old.iteritems())


    for _ in range(num_cases):
        result = distance_constrained_shuffle(sequence, distance)
        if not validate(distance, sequence, result):
            print "Constraint violated. ", result
            break
    else:
        print "No constraint violations"


test()
Run Code Online (Sandbox Code Playgroud)

出去:

No constraint violations
Run Code Online (Sandbox Code Playgroud)

距离分布

我不确定是否有办法使距离均匀分布,但这里有一个验证分布的函数。

def distance_distribution(maxdistance = 3, sequence = range(3000)):
    from collections import Counter

    def count_distances(lst, answer):
        new = {e:i for i,e in enumerate(answer)}
        return Counter(i-new[e] for i,e in enumerate(lst))    

    answer = distance_constrained_shuffle(sequence, maxdistance)
    counter = count_distances(sequence, answer)

    sequence_length = float(len(sequence))

    distances = range(-maxdistance, maxdistance+1)
    return distances, [counter[d]/sequence_length for d in distances]

distance_distribution()
Run Code Online (Sandbox Code Playgroud)

出去:

([-3, -2, -1, 0, 1, 2, 3],
 [0.01,
  0.076,
  0.22166666666666668,
  0.379,
  0.22933333333333333,
  0.07766666666666666,
  0.006333333333333333])
Run Code Online (Sandbox Code Playgroud)

d=3 时的距离分布/pdf

或者对于最大距离更大的情况:

distance_distribution(maxdistance=9, sequence=range(100*1000))
Run Code Online (Sandbox Code Playgroud)

d=9 的距离分布


Tho*_*hel 1

我不确定它有多好,但也许是这样的:

  • 创建一个与初始列表 L 长度相同的列表;该列表的每个元素应该是允许移动到此处的初始索引的索引列表;例如[[0,1,2],[0,1,2,3],[0,1,2,3],[1,2,3]],如果我正确理解你的例子;
  • 最小的子列表(如果多个列表共享相同的长度,则取任何最小的子列表);
  • 在其中选择一个随机元素random.choice,该元素是初始列表中要映射到当前位置的元素的索引(使用另一个列表来构建新列表);
  • 从所有子列表中删除随机选择的元素

例如:

L = [ "A", "B", "C", "D" ]
i = [[0,1,2],[0,1,2,3],[0,1,2,3],[1,2,3]]
# I take [0,1,2] and pick randomly 1 inside
# I remove the value '1' from all sublists and since
# the first sublist has already been handled I set it to None
# (and my result will look as [ "B", None, None, None ]
i = [None,[0,2,3],[0,2,3],[2,3]]
# I take the last sublist and pick randomly 3 inside
# result will be ["B", None, None, "D" ]
i = [None,[0,2], [0,2], None]
etc.
Run Code Online (Sandbox Code Playgroud)

不过我还没有尝试过。问候。