Random function in DB2 is not evenly distributed

0 sql random db2

I find this completely astounding but the rand() function in DB2 occasionally returns a value of one. Consider this select against a table that has about 150K rows in it:

select integer(rand()*10) as Num, count(*) as N
from TabWithAbout150KRows
group by integer(rand()*10)
order by 1 desc;
Run Code Online (Sandbox Code Playgroud)

In most languages/DB's, etc, I'd expect this to return 10 rows of data, with the distribution being roughly equal. What I actually get is 11 rows, as in the following:

Num       N
---   -----
10       12 
9     14871 
8     14975 
7     15213 
6     15004 
5     15196 
4     14998 
3     14916 
2     14926 
1     15081 
0     15017 
Run Code Online (Sandbox Code Playgroud)

Shocking! In my use case I'm updating rows in a table and want to assign a random value, but it needs to be randomly distributed as opposed to the horrible situation above.

So I'm currently thinking I'll have to do the update multiple times in a loop, continuing in the 2nd...nth iterations to try again for the rows that were unlucky enough to end up with rand()=1.0

Or, I could use rand()/1.00001, but this is just silly (and not evenly distributed, either)!

Any ideas on a better way to approach this (without, for example, writing UDF's, etc, would be appreciated).

Cha*_*les 5

我在 2008 年使用 DB2/400 遇到了这个问题......

rand() 返回一个包含 [0,1] 范围内的浮点值
rand() * 10 返回一个包含 [0,10] 范围内的浮点值

然后你转换成一个整数,你有以下内容

[0.000, 0.9999] => 0
[1.000, 1.9999] => 1
[2.000, 2.9999] => 2
[3.000, 3.9999] => 3
[4.000, 4.9999] => 4
[5.000, 5.9999] => 5
[6.000, 6.9999] => 6
[7.000, 7.9999] => 7
[8.000, 8.9999] => 8
[9.000, 9.9999] => 9
[10.000, 10.000] => 10
Run Code Online (Sandbox Code Playgroud)

如您所见,最终得到的 10 比任何其他数字都要少得多。

乘法后面的截断是问题所在。舍入而不是截断无济于事,因为仍有较小的值范围导致 0 或 10。

许多 rand() 函数返回 [0,1) 范围内的值(不包括 1)。但是 DB2 返回 [0,1]。

我在 DB2 中使用了以下内容来获取 0 到 N 之间的随机整数

floor(rand() * N + 0.99999)
Run Code Online (Sandbox Code Playgroud)

我认为分布可能仍与“完美”有一点偏差。但对我来说已经足够好了。