I find this completely astounding but the rand() function in DB2 occasionally returns a value of one. Consider this select against a table that has about 150K rows in it:
select integer(rand()*10) as Num, count(*) as N
from TabWithAbout150KRows
group by integer(rand()*10)
order by 1 desc;
Run Code Online (Sandbox Code Playgroud)
In most languages/DB's, etc, I'd expect this to return 10 rows of data, with the distribution being roughly equal. What I actually get is 11 rows, as in the following:
Num N
--- -----
10 12
9 14871
8 14975
7 15213
6 15004
5 15196
4 14998
3 14916
2 14926
1 15081
0 15017
Run Code Online (Sandbox Code Playgroud)
Shocking! In my use case I'm updating rows in a table and want to assign a random value, but it needs to be randomly distributed as opposed to the horrible situation above.
So I'm currently thinking I'll have to do the update multiple times in a loop, continuing in the 2nd...nth iterations to try again for the rows that were unlucky enough to end up with rand()=1.0
Or, I could use rand()/1.00001, but this is just silly (and not evenly distributed, either)!
Any ideas on a better way to approach this (without, for example, writing UDF's, etc, would be appreciated).
我在 2008 年使用 DB2/400 遇到了这个问题......
rand() 返回一个包含 [0,1] 范围内的浮点值
rand() * 10 返回一个包含 [0,10] 范围内的浮点值
然后你转换成一个整数,你有以下内容
[0.000, 0.9999] => 0
[1.000, 1.9999] => 1
[2.000, 2.9999] => 2
[3.000, 3.9999] => 3
[4.000, 4.9999] => 4
[5.000, 5.9999] => 5
[6.000, 6.9999] => 6
[7.000, 7.9999] => 7
[8.000, 8.9999] => 8
[9.000, 9.9999] => 9
[10.000, 10.000] => 10
Run Code Online (Sandbox Code Playgroud)
如您所见,最终得到的 10 比任何其他数字都要少得多。
乘法后面的截断是问题所在。舍入而不是截断无济于事,因为仍有较小的值范围导致 0 或 10。
许多 rand() 函数返回 [0,1) 范围内的值(不包括 1)。但是 DB2 返回 [0,1]。
我在 DB2 中使用了以下内容来获取 0 到 N 之间的随机整数
floor(rand() * N + 0.99999)
Run Code Online (Sandbox Code Playgroud)
我认为分布可能仍与“完美”有一点偏差。但对我来说已经足够好了。