修改排列算法以防止重复打印输出的策略

Question

修改排列算法以防止重复打印输出的策略

我一直在审查练习算法,我现在正在研究一种我非常喜欢的置换算法:

void permute(char* set, int begin, int end) {
    int range = end - begin;

    if (range == 1)
        cout << set << endl;
    else {
        for(int i = 0; i < range; ++i) {
            swap(&set[begin], &set[begin+i]);
            permute(set, begin+1, end);
            swap(&set[begin], &set[begin+i]);
        }
    }
}

Run Code Online (Sandbox Code Playgroud)

我实际上想将此应用于会有许多重复字符的情况,所以我需要能够修改它以防止打印重复的排列.

我如何检测我是否生成了重复内容？我知道我可以将它存储在散列或类似的东西中,但这不是最佳解决方案 - 我更喜欢不需要额外存储的解决方案.有人可以给我一个建议吗？

PS:我不想使用STL置换机制,我不想在某处引用另一个"唯一置换算法".我想了解用于防止重复的机制,以便在可能的情况下将其构建到学习中.

Answer 1

Rei*_*ica 7

有没有通用的方法,以防止随意从产生重复的功能.当然,您可以随时过滤掉重复项,但您不希望这样做,并且有很好的理由.因此,您需要一种特殊的方法来仅生成非重复项.

一种方法是以增加的字典顺序生成排列.然后你可以比较一个"新"的permatutation是否与最后一个相同,然后跳过它.它变得更好:在http://en.wikipedia.org/wiki/Permutations#Generation_in_lexicographic_order给出的增加字典顺序的排列生成算法甚至根本不生成重复项!

但是,这不是您问题的答案,因为它是一种不同的算法(尽管也基于交换).

那么,让我们看一下你的算法.一个关键的观察是:

一旦角色被交换到位置begin,它将留在那里进行所有嵌套调用permute.

我们将这与以下关于排列的一般观察结合起来:

如果您置换一个字符串s,但只在具有相同字符的位置,s将保持不变.实际上,所有重复排列对于某些字符c的出现具有不同的顺序,其中c出现在相同的位置.

好的,所以我们要做的就是确保每个角色的出现始终与开头的顺序相同.代码如下,但是......我真的不会说C++,所以我会使用Python并希望能够声称它是伪代码.

我们从你的原始算法开始,用'伪代码'重写:

def permute(s, begin, end):
    if end == begin + 1:
        print(s)
    else:
        for i in range(begin, end):
            s[begin], s[i] = s[i], s[begin]
            permute(s, begin + 1, end)
            s[begin], s[i] = s[i], s[begin]

Run Code Online (Sandbox Code Playgroud)

以及一个使调用更容易的辅助函数:

def permutations_w_duplicates(s):
    permute(list(s), 0, len(s)) # use a list, as in Python strings are not mutable

Run Code Online (Sandbox Code Playgroud)

现在我们通过一些簿记来扩展permute函数,关于某个字符被交换到该begin位置的次数(即已被修复),并且我们还记住每个字符(char_number)的出现的原始顺序.我们尝试交换到该begin位置的每个字符必须是原始顺序中的下一个字符,即字符的修复数量定义了下一个可以修复此字符的原始出现 - 我称之为next_fixable.

def permute2(s, next_fixable, char_number, begin, end):
    if end == begin + 1:
        print(s)
    else:
        for i in range(begin, end):
            if next_fixable[s[i]] == char_number[i]: 
                next_fixable[s[i]] += 1
                char_number[begin], char_number[i] = char_number[i], char_number[begin]

                s[begin], s[i] = s[i], s[begin]
                permute2(s, next_fixable, char_number, begin + 1, end)
                s[begin], s[i] = s[i], s[begin]

                char_number[begin], char_number[i] = char_number[i], char_number[begin]
                next_fixable[s[i]] -= 1

Run Code Online (Sandbox Code Playgroud)

我们再次使用辅助函数:

def permutations_wo_duplicates(s):
    alphabet = set(s)
    next_fixable = dict.fromkeys(alphabet, 0)
    count = dict.fromkeys(alphabet, 0)
    char_number = [0] * len(s)
    for i, c in enumerate(s):
        char_number[i] = count[c]
        count[c] += 1

    permute2(list(s), next_fixable, char_number, 0, len(s))

Run Code Online (Sandbox Code Playgroud)

而已!

几乎.如果您愿意,可以在此处停止并使用C++重写,但如果您对某些测试数据感兴趣,请继续阅读.

我使用了稍微不同的代码进行测试,因为我不想打印所有的排列.在Python中,你将把a替换为printa yield,将函数转换为生成函数,其结果可以用for循环迭代,并且只在需要时才计算置换.这是我使用的真实代码和测试:

def permute2(s, next_fixable, char_number, begin, end):
    if end == begin + 1:
        yield "".join(s) # join the characters to form a string
    else:
        for i in range(begin, end):
            if next_fixable[s[i]] == char_number[i]:
                next_fixable[s[i]] += 1
                char_number[begin], char_number[i] = char_number[i], char_number[begin]
                s[begin], s[i] = s[i], s[begin]
                for p in permute2(s, next_fixable, char_number, begin + 1, end):
                    yield p
                s[begin], s[i] = s[i], s[begin]
                char_number[begin], char_number[i] = char_number[i], char_number[begin]
                next_fixable[s[i]] -= 1

def permutations_wo_duplicates(s):
    alphabet = set(s)
    next_fixable = dict.fromkeys(alphabet, 0)
    count = dict.fromkeys(alphabet, 0)
    char_number = [0] * len(s)
    for i, c in enumerate(s):
        char_number[i] = count[c]
        count[c] += 1

    for p in permute2(list(s), next_fixable, char_number, 0, len(s)):
        yield p


s = "FOOQUUXFOO"
A = list(permutations_w_duplicates(s))
print("%s has %s permutations (counting duplicates)" % (s, len(A)))
print("permutations of these that are unique: %s" % len(set(A)))
B = list(permutations_wo_duplicates(s))
print("%s has %s unique permutations (directly computed)" % (s, len(B)))

print("The first 10 permutations       :", A[:10])
print("The first 10 unique permutations:", B[:10])

Run Code Online (Sandbox Code Playgroud)

结果如下:

FOOQUUXFOO has 3628800 permutations (counting duplicates)
permutations of these that are unique: 37800
FOOQUUXFOO has 37800 unique permutations (directly computed)
The first 10 permutations       : ['FOOQUUXFOO', 'FOOQUUXFOO', 'FOOQUUXOFO', 'FOOQUUXOOF', 'FOOQUUXOOF', 'FOOQUUXOFO', 'FOOQUUFXOO', 'FOOQUUFXOO', 'FOOQUUFOXO', 'FOOQUUFOOX']
The first 10 unique permutations: ['FOOQUUXFOO', 'FOOQUUXOFO', 'FOOQUUXOOF', 'FOOQUUFXOO', 'FOOQUUFOXO', 'FOOQUUFOOX', 'FOOQUUOFXO', 'FOOQUUOFOX', 'FOOQUUOXFO', 'FOOQUUOXOF']

Run Code Online (Sandbox Code Playgroud)

请注意,排列的计算顺序与原始算法的顺序相同,只是没有重复.注意37800*2!*2!*4!= 3628800,就像你期望的那样.

归档时间：	13 年，9 月前
查看次数：	3449 次
最近记录：	7 年，4 月前