限制连续重复次数的洗牌算法?

埃博拉*_*博拉酱 5 algorithm shuffle fisher-yates-shuffle

我想要一个采用以下参数的字符串改组算法:

void CRLimitedShuffle(char* Str, uint8_t MaxConsecutiveRepetition);
Run Code Online (Sandbox Code Playgroud)

该函数应对输入字符串执行随机洗牌。然而:

  • 必须遵守限制MaxConsecutiveRepetition,即结果字符串中最多MaxConsecutiveRepetition允许出现连续的相同字符。
  • 所有正确的洗牌结果必须以相同的概率出现。

例如:

Str="aaaaabbbbb";
MaxConsecutiveRepetition=3;
Run Code Online (Sandbox Code Playgroud)

对于上面的参数,“aabbbaaabb”是正确的结果,而“aaabaabbbb”是不正确的,因为它有 4 个连续的“b”,超过MaxConsecutiveRepetition

一个天真的想法是对打乱的结果进行审核,如果不符合要求就重新打乱。然而,对于某些极端情况,性能可能会极低。例如:

Str="aaaabbbbbbbbbb";
MaxConsecutiveRepetition=2;
Run Code Online (Sandbox Code Playgroud)

在这个例子中,只有“bbabbabbabbabbabb”是正确的输出,所有其他随机生成的序列都被丢弃。这样,函数就会不断循环,直到以很小的概率生成唯一的正确字符串。

另一个想法是生成所有可能的排列,然后消除那些错误的排列,并从剩余的正确排列中随机抽取。但该算法的复杂度将是阶乘的。

最终的想法是对 Fisher-Yates 进行数学修改:从随机字符开始,然后每次绘制下一个字符时,将前一个字符被绘制的概率乘以衰减系数。例如,如果前一个字符已连续绘制了MaxConsecutiveRepetition多次,则衰减系数应变为0,以确保下一个字符不能与前一个字符相同。然而,我不知道如何准确计算这个衰减系数以确保以相等的概率生成所有正确的排列。

我需要一个满足上面列出的两个要求的算法。

bti*_*lly 2

面向对象程序

我无意中解决了限制重复次数的问题。而不是连续有多少次重复。所以我解决了错误的问题。

我已经离开了原来的解决方案。但在最后添加了所需的。


这个想法听起来比实际上更简单。

问题是找出有多少排列具有所需的模式。可以根据以下公式递归计算:

  1. 剩余的最大重复次数。
  2. 最后使用的字符有多少个。
  3. 表示频率计数分布的分区。

然后你可以通过记忆来加快速度。该函数如下所示(在 Python 中,并使用元组的元组可以用作字典键的事实):

answers_count = {}
def _count_answers(frequency_count, max_repetition, last_freq):
    partition = []
    for freq, count in sorted(frequency_count.items()):
        if 0 < count:
            partition.append((freq, count))

    if 0 == len(partition):
        if last_freq <= max_repetition + 1:
            return 1
        else:
            return 0

    key = (max_repetition, last_freq, tuple(partition))
    if key not in answers_count:
        # Better safe than sorry, make a copy.
        frequency_count = frequency_count.copy()

        answers = 0
        # Can we repeat here?
        if 1 < max_repetition and 1 < last_freq:
            answers += _count_answers(
                frequency_count, max_repetition - 1, last_freq - 1)

        # We will return this character to the pool.
        return_freq = last_freq - 1
        if 0 < return_freq:
            frequency_count[return_freq] = 1 + frequency_count.get(return_freq, 0)

        # The list is to avoid iterating as we are modifying.
        for freq, count in list(frequency_count.items()):
            # Adjust for one of these being taken
            if count == 1:
                frequency_count.pop(freq)
            else:
                frequency_count[freq] -= 1

            if freq == return_freq:
                if 1 < count:
                    answers += (count - 1) * _count_answers(frequency_count, max_repetition, freq)
            else:
                answers += count * _count_answers(frequency_count, max_repetition, freq)

            # Restore the key.
            frequency_count[freq] = count

        answers_count[key] = answers
    return answers_count[key]
Run Code Online (Sandbox Code Playgroud)

事实证明,从我们如何使用它来从频率到字符的字典开始,它更方便一些。所以我们需要下面这个小帮手。

def count_answers(frequency_chars, max_repetition, last_freq):
    frequency_count = {}
    for freq, chars in frequency_chars.items():
        if 0 < len(chars):
            frequency_count[freq] = len(chars)
    return _count_answers(frequency_count, max_repetition, last_freq)
Run Code Online (Sandbox Code Playgroud)

现在我们实际上可以找到我们的排列。

def shuffle_limited_repetition (string, max_repetition):
    # Calculate the frequency of each characters.
    char_frequency = {}
    for char in string:
        char_frequency[char] = 1 + char_frequency.get(char, 0)

    # Turn that around to get the characters with a given frequency.
    frequency_chars = {}
    for char, freq in char_frequency.items():
        if freq in frequency_chars:
            frequency_chars[freq].append(char)
        else:
            frequency_chars[freq] = [char]

    # Find how many permutations there are and choose one.
    total_answers = count_answers(frequency_chars, max_repetition, 1)
    if total_answers == 0:
        return None
    chosen = random.randrange(total_answers)

    # We will build up an array of characters, and join it to get our answer.
    final_chars = []

    # Our state is:
    # - frequency_chars (which we have)
    # - the last character we used
    # - how many copies there were
    #
    # So initialize the rest of that state with...
    # "We put down nothing, and there are no more copies of it."
    last_char = None
    last_freq = 1

    # Until our permutation is as big as we want...
    while len(final_chars) < len(string):
        # Can we put down last_char again?
        if 0 < max_repetition and 1 < last_freq:
            # Is our choice the one where we put down last_char again?
            if chosen < count_answers(frequency_chars, max_repetition - 1, last_freq - 1):
                # It is, record that, update the state, then skip.
                final_chars.append(last_char)
                max_repetition -= 1
                last_freq -= 1
                continue
            else:
                # Skip past all of teh choices where we put down the last_char again.
                chosen -= count_answers(frequency_chars, max_repetition - 1, last_freq - 1)

        # Return last_char to frequency_chars.
        return_freq = last_freq - 1 # What it had been, minus the one we put down.
        if 0 == return_freq:
            pass # Just kidding, we're done with this character.
        elif return_freq in frequency_chars:
            # Add it to existing frequency.
            frequency_chars[return_freq].append(last_char)
        else:
            # This frequency now exists with 1 char in it.
            frequency_chars[return_freq] = [last_char]

        # The list(...) construct is to avoid modifying frequency_chars
        # as we're iterating over it.
        for freq, chars in list(frequency_chars.items()):
            # NOTE: chars is a reference to frequency_chars[freq]
            # So altering our copy, alters the original.

            # Simulate taking a character, how many solutions is that?
            tmp_char = chars.pop()
            count = count_answers(frequency_chars, max_repetition, freq)
            chars.append(tmp_char)

            # How many characters can we take?
            count_chars = len(chars)
            if freq == return_freq:
                # We can't take the one we just put back!
                count_chars -= 1

            # Are we going to take one of these characters?
            if chosen < count_chars * count:
                # We are, we need to figure out which one.
                i = 0
                while count <= chosen:
                    i += 1
                    chosen -= count
                # Now update our state to indicate our choice.
                last_char = chars.pop(i)
                last_freq = freq

                if 0 == len(chars):
                    # Remove this frequency if we're done with it.
                    frequency_chars.pop(freq)
                final_chars.append(last_char)

                # NOTE: Exit loop here, we made the choice.
                break
            else:
                # Skip past all of these choices.
                chosen -= count_chars * count

    return ''.join(final_chars)
Run Code Online (Sandbox Code Playgroud)

并用行动来证明它。

for i in range(100):
    print(shuffle_limited_repetition("aaaabbbbc", 3))
Run Code Online (Sandbox Code Playgroud)

引用 Jason Calcanis 的话:“我们这样做不是因为它很容易。我们这样做是因为我们认为它会很容易。”

我严重低估了这段代码的复杂程度。然后一旦我要走了,我就想完成......


现在来解决正确的问题。

import random

def count_answers(frequency_chars, max_consecutive, used_freq):
    frequency_count = {}
    for freq, chars in frequency_chars.items():
        if 0 < len(chars):
            frequency_count[freq] = len(chars)
    return _count_answers(frequency_count, max_consecutive, used_freq)

answers_count = {}
def _count_answers(frequency_count, max_consecutive, used_freq):
    partition = []
    for freq, count in sorted(frequency_count.items()):
        if 0 < count:
            partition.append((freq, count))

    if 0 == len(partition):
        if 0 == used_freq:
            return 1
        else:
            return 0

    key = (max_consecutive, used_freq, tuple(partition))
    if key not in answers_count:
        frequency_count = frequency_count.copy()
        answers = 0

        # We will return this character to the pool.
        if 0 < used_freq:
            frequency_count[used_freq] = 1 + frequency_count.get(used_freq, 0)

        # The list is to avoid iterating as we are modifying.
        for freq, count in list(frequency_count.items()):
            # Adjust for one of these being taken
            if count == 1:
                frequency_count.pop(freq)
            else:
                frequency_count[freq] -= 1

            for i in range(1, min(freq, max_consecutive) + 1):

                these_answers = 0
                this_answer_count = _count_answers(frequency_count, max_consecutive, freq - i)
                if freq == used_freq:
                    answers += (count - 1) * this_answer_count
                else:
                    answers += count * this_answer_count

            # replace the taken one.
            frequency_count[freq] = count

        answers_count[key] = answers
    return answers_count[key]

def shuffle_limited_repetition (string, max_consecutive):
    # Calculate the frequency of each characters.
    char_frequency = {}
    for char in string:
        char_frequency[char] = 1 + char_frequency.get(char, 0)

    # Turn that around to get the characters with a given frequency.
    frequency_chars = {}
    for char, freq in char_frequency.items():
        if freq in frequency_chars:
            frequency_chars[freq].append(char)
        else:
            frequency_chars[freq] = [char]

    # Find how many permutations there are and choose one.
    total_answers = count_answers(frequency_chars, max_consecutive, 0)
    #for k, v in answers_count.items():
    #    print(k, v)
    if total_answers == 0:
        return None
    chosen = random.randrange(total_answers)

    # We will build up an array of characters, and join it to get our answer.
    final_chars = []

    # Our state is:
    # - frequency_chars (which we have)
    # - the last character we used
    # - how many copies are left
    #
    # So initialize the rest of that state with...
    # "We put down nothing, and there are no more copies of it."
    used_char = None
    used_freq = 0

    # Until our permutation is as big as we want...
    while len(final_chars) < len(string):
        #print(final_chars, frequency_chars, used_char, used_freq)
        # Return used_char to frequency_chars.
        if 0 == used_freq:
            pass # Just kidding, we're done with this character.
        elif used_freq in frequency_chars:
            # Add it to existing frequency.
            frequency_chars[used_freq].append(used_char)
        else:
            # This frequency now exists with 1 char in it.
            frequency_chars[used_freq] = [used_char]

        # The list(...) construct is to avoid modifying frequency_chars
        # as we're iterating over it.
        for freq, chars in list(frequency_chars.items()):
            is_found = False
            for i in range(1, min(freq, max_consecutive) + 1):
                # NOTE: chars is a reference to frequency_chars[freq]
                # So altering our copy, alters the original.

                # Simulate taking a character, how many solutions is that?
                count = 0
                if 0 < len(chars):
                    tmp_char = chars.pop()
                    count = count_answers(frequency_chars, max_consecutive, freq - i)
                    chars.append(tmp_char)

                # How many characters can we take?
                count_chars = len(chars)
                if freq == used_freq:
                    # We can't take the one we just put back!
                    count_chars -= 1

                # Are we going to take one of these characters?
                if chosen < count_chars * count:
                    # We are, we need to figure out which one.
                    j = 0
                    while count <= chosen:
                        j += 1
                        chosen -= count
                    # Now update our state to indicate our choice.
                    used_char = chars.pop(j)
                    for _ in range(i):
                        final_chars.append(used_char)
                    used_freq = freq - i
                    is_found = True
                    break

                else:
                    # Skip past all of these choices.
                    chosen -= count_chars * count
            if is_found:
                break

    return ''.join(final_chars)

for i in range(100):
    print(shuffle_limited_repetition("aaaabbbb", 2))
Run Code Online (Sandbox Code Playgroud)