为什么用Perl的List :: Util :: shuffle会得到不好的随机分布？

Question

为什么用Perl的List :: Util :: shuffle会得到不好的随机分布？

我收集了数百个黑胶唱片，由目录ID字符串按字母数字顺序组织。我编写了一个脚本，该脚本通过对随机排列的目录ID数组进行采样，从我的收藏集中随机选择20条记录。但是，我发现它为我选择的记录常常分布不佳。通常，它会选择2个具有顺序目录ID的记录，和/或几组彼此靠近的记录。从800条记录中选择20条记录时，这种情况很少发生。

我将目录ID的列表存储在@selection数组中，并从该数组中随机抽取20个项目的样本，我从混洗后的数组中分配前20个项目：

@selection = (shuffle @selection)[0 .. 19];

Run Code Online (Sandbox Code Playgroud)

无奈之下，我尝试使用这种丑陋的技术来试图增强随机性，但似乎没有什么区别：

@selection = shuffle @selection; sleep 1;
@selection = reverse @selection; sleep 1;
@selection = (shuffle @selection)[0 .. 19];

Run Code Online (Sandbox Code Playgroud)

Answer 1

ike*_*ami 5

There are C(800, 20) = 3.73 × 10³⁹ ways of choosing 20 titles from 800.

There are C(781, 20) = 2.29 × 10³⁹ ways of choosing 20 titles from 800 where no two are adjacent.^[1]

There is therefore a (2.29 × 10³⁹) / (3.73 × 10³⁹) = 61.4% chance of picking a set that contains no adjacent titles.

There is therefore a 1 - 61.4% = 38.6% chance of picking a set that contains adjacent titles.

Now that we know what to expect, let's put shuffle to the test.

Test:

#!/usr/bin/perl
use strict;
use warnings;
use List::Util qw( shuffle );

my $num_tests = 100_000;
my $N = 800;
my @titles = 0..($N-1);
my $has_adjacent_titles = 0;
for (1..$num_tests) {
   my @shuffled_selection = ( shuffle(@titles) )[0..19];
   my @ordered = sort { $a <=> $b } @shuffled_selection;
   ++$has_adjacent_titles if grep { $ordered[$_-1]+1 == $ordered[$_] } 1..$#ordered;
}

printf "%.1f%%\n", $has_adjacent_titles / $num_tests * 100;

Run Code Online (Sandbox Code Playgroud)

Output:

>a.pl
38.6%

>a.pl
38.8%

>a.pl
38.5%

Run Code Online (Sandbox Code Playgroud)

Seems like shuffle is working quite well.

See Combinatorial restriction on choosing adjacent objects,

归档时间：	7 年，2 月前
查看次数：	58 次
最近记录：	7 年，2 月前