bla*_*ght 0 arrays perl shuffle
我收集了数百个黑胶唱片,由目录ID字符串按字母数字顺序组织。我编写了一个脚本,该脚本通过对随机排列的目录ID数组进行采样,从我的收藏集中随机选择20条记录。但是,我发现它为我选择的记录常常分布不佳。通常,它会选择2个具有顺序目录ID的记录,和/或几组彼此靠近的记录。从800条记录中选择20条记录时,这种情况很少发生。
我将目录ID的列表存储在@selection数组中,并从该数组中随机抽取20个项目的样本,我从混洗后的数组中分配前20个项目:
@selection = (shuffle @selection)[0 .. 19];
Run Code Online (Sandbox Code Playgroud)
无奈之下,我尝试使用这种丑陋的技术来试图增强随机性,但似乎没有什么区别:
@selection = shuffle @selection; sleep 1;
@selection = reverse @selection; sleep 1;
@selection = (shuffle @selection)[0 .. 19];
Run Code Online (Sandbox Code Playgroud)
There are C(800, 20) = 3.73 × 1039 ways of choosing 20 titles from 800.
There are C(781, 20) = 2.29 × 1039 ways of choosing 20 titles from 800 where no two are adjacent.[1]
There is therefore a (2.29 × 1039) / (3.73 × 1039) = 61.4% chance of picking a set that contains no adjacent titles.
There is therefore a 1 - 61.4% = 38.6% chance of picking a set that contains adjacent titles.
Now that we know what to expect, let's put shuffle to the test.
Test:
#!/usr/bin/perl
use strict;
use warnings;
use List::Util qw( shuffle );
my $num_tests = 100_000;
my $N = 800;
my @titles = 0..($N-1);
my $has_adjacent_titles = 0;
for (1..$num_tests) {
my @shuffled_selection = ( shuffle(@titles) )[0..19];
my @ordered = sort { $a <=> $b } @shuffled_selection;
++$has_adjacent_titles if grep { $ordered[$_-1]+1 == $ordered[$_] } 1..$#ordered;
}
printf "%.1f%%\n", $has_adjacent_titles / $num_tests * 100;
Run Code Online (Sandbox Code Playgroud)
Output:
>a.pl
38.6%
>a.pl
38.8%
>a.pl
38.5%
Run Code Online (Sandbox Code Playgroud)
Seems like shuffle is working quite well.