为什么数字1,2和3使用C rand()函数经常出现?

Moe*_*oeb 28 c random

我想要做的是生成一些随机数(不一定是单个数字)

29106
7438
5646
4487
9374
28671
92
13941
25226
10076
Run Code Online (Sandbox Code Playgroud)

然后计算我得到的位数:

count[0] =       3  Percentage =  6.82
count[1] =       5  Percentage = 11.36
count[2] =       6  Percentage = 13.64
count[3] =       3  Percentage =  6.82
count[4] =       6  Percentage = 13.64
count[5] =       2  Percentage =  4.55
count[6] =       7  Percentage = 15.91
count[7] =       5  Percentage = 11.36
count[8] =       3  Percentage =  6.82
count[9] =       4  Percentage =  9.09
Run Code Online (Sandbox Code Playgroud)

这是我正在使用的代码:

#include <stdio.h>
#include <time.h>
#include <stdlib.h>

int main() {

    int i;
    srand(time(NULL));
    FILE* fp = fopen("random.txt", "w");    
    // for(i = 0; i < 10; i++)
    for(i = 0; i < 1000000; i++)
        fprintf(fp, "%d\n", rand());
    fclose(fp);

    int dummy;
    long count[10] = {0,0,0,0,0,0,0,0,0,0};
    fp = fopen("random.txt", "r");
    while(!feof(fp)) {
        fscanf(fp, "%1d", &dummy);
        count[dummy]++;                 
    }
    fclose(fp);

    long sum = 0;
    for(i = 0; i < 10; i++)
        sum += count[i];

    for(i = 0; i < 10; i++)
        printf("count[%d] = %7ld  Percentage = %5.2f\n",
            i, count[i], ((float)(100 * count[i])/sum));

}
Run Code Online (Sandbox Code Playgroud)

如果我生成大量随机数(1000000),这是我得到的结果:

count[0] =  387432  Percentage =  8.31
count[1] =  728339  Percentage = 15.63
count[2] =  720880  Percentage = 15.47
count[3] =  475982  Percentage = 10.21
count[4] =  392678  Percentage =  8.43
count[5] =  392683  Percentage =  8.43
count[6] =  392456  Percentage =  8.42
count[7] =  391599  Percentage =  8.40
count[8] =  388795  Percentage =  8.34
count[9] =  389501  Percentage =  8.36
Run Code Online (Sandbox Code Playgroud)

请注意,1,2和3的命中次数太多.我尝试过多次运行,每次都得到非常相似的结果.

我试图理解什么可能导致1,2和3比任何其他数字更频繁地出现.


从Matt Joiner和Pascal Cuoq所指出的暗示,

我更改了要使用的代码

for(i = 0; i < 1000000; i++)
    fprintf(fp, "%04d\n", rand() % 10000);
// pretty prints 0
// generates numbers in range 0000 to 9999
Run Code Online (Sandbox Code Playgroud)

这就是我得到的(多次运行时类似的结果):

count[0] =  422947  Percentage = 10.57
count[1] =  423222  Percentage = 10.58
count[2] =  414699  Percentage = 10.37
count[3] =  391604  Percentage =  9.79
count[4] =  392640  Percentage =  9.82
count[5] =  392928  Percentage =  9.82
count[6] =  392737  Percentage =  9.82
count[7] =  392634  Percentage =  9.82
count[8] =  388238  Percentage =  9.71
count[9] =  388352  Percentage =  9.71
Run Code Online (Sandbox Code Playgroud)

0,1和2受到青睐的原因是什么?


感谢大家.运用

int rand2(){
    int num = rand();
    return (num > 30000? rand2():num);     
}

    fprintf(fp, "%04d\n", rand2() % 10000);
Run Code Online (Sandbox Code Playgroud)

我明白了

count[0] =  399629  Percentage =  9.99
count[1] =  399897  Percentage = 10.00
count[2] =  400162  Percentage = 10.00
count[3] =  400412  Percentage = 10.01
count[4] =  399863  Percentage = 10.00
count[5] =  400756  Percentage = 10.02
count[6] =  399980  Percentage = 10.00
count[7] =  400055  Percentage = 10.00
count[8] =  399143  Percentage =  9.98
count[9] =  400104  Percentage = 10.00
Run Code Online (Sandbox Code Playgroud)

Mat*_*ner 46

rand()产生从一个值0RAND_MAX.RAND_MAX设置INT_MAX在大多数平台上,可能是327672147483647.

对于上面给出的例子,它似乎RAND_MAX32767.这将会把一个非常高的频率1,23为从价值的最显著的数字1000032767.您可以在较小程度上观察到值,6并且7也会略微受到青睐.

  • 更为重要的是,六和七的偏差是对零的偏见.00012是漂亮的印刷品"12",但11112印刷精美的"11112".如果范围是10的幂,那么将统计平衡的所有前导零都被`printf`省略. (11认同)
  • '对于任何数字> 32700,第四个数字可以高达6.对于任何数字> 32760,第四个数字可以高达7. (4认同)

ken*_*ytm 20

关于编辑的问题,

这是因为即使你,数字仍然不均匀分布% 10000.假设RAND_MAX == 32767,并且rand()完全统一.

对于从0开始计数的每10,000个数字,所有数字将统一显示(每个4,000个).但是,32,767不能被10,000整除.因此,这些2,768个数字将为最终计数提供更多前导0,1和2.

这2,768个数字的确切贡献是:

digits count
0      1857
1      1857
2      1625
3      857
4      857
5      857
6      855
7      815
8      746
9      746
Run Code Online (Sandbox Code Playgroud)

将最初的30,000个数字加12,000加到计数中,然后除以总数位数(4×32,768),可以得到预期的分布:

number  probability (%)
0       10.5721
1       10.5721
2       10.3951
3        9.80911
4        9.80911
5        9.80911
6        9.80759
7        9.77707
8        9.72443
9        9.72443
Run Code Online (Sandbox Code Playgroud)

这是接近你得到的.

如果您想要真正统一的数字分布,您需要拒绝这2,768个数字:

int rand_4digits() {
  const int RAND_MAX_4_DIGITS = RAND_MAX - RAND_MAX % 10000;
  int res;
  do {
    res = rand();
  } while (res >= RAND_MAX_4_DIGITS);
  return res % 10000;
}
Run Code Online (Sandbox Code Playgroud)


小智 7

看起来像本福德定律 - 见http://en.wikipedia.org/wiki/Benford%27s_law,或者是一个不太好的RNG.