Sve*_*ach 70
计算列表的离散累积密度函数(CDF) - 或者简单地计算权重的累积和数组.然后生成一个介于0和所有权重之和(在你的情况下可能是1)的范围内的随机数,进行二进制搜索以在离散CDF数组中找到该随机数并获得与该条目对应的值 - 这是你的加权随机数.
小智 13
算法很简单
rand_no = rand(0,1)
for each element in array
if(rand_num < element.probablity)
select and break
rand_num = rand_num - element.probability
Run Code Online (Sandbox Code Playgroud)
我发现这篇文章对于充分理解这个问题最有用. 此stackoverflow问题也可能是您正在寻找的问题.
我相信最佳解决方案是使用Alias方法(维基百科).它需要O(n)时间来初始化,O(1)时间进行选择,以及O(n)存储器.
这是用于生成加权n侧模具的结果的算法(从这里开始,从长度为n的数组中选择一个元素是微不足道的),如本文所述.作者假设您具有滚动公平骰子(floor(random() * n))和翻转偏见硬币(random() < p)的功能.
算法:Vose的别名方法
初始化:
- 创建阵列Alias和Prob,每个大小为n.
- 创建两个工作清单,小型和大型.
- 将每个概率乘以n.
- 对于每个缩放概率p i:
- 如果p i <1,则将i添加到Small.
- 否则(p 我 ≥1 ),添加我到大.
- 虽然小和大不是空的:( 大可能先清空)
- 从Small中删除第一个元素; 叫它l.
- 从Large中删除第一个元素; 叫它g.
- 设置Prob [l] = p l.
- 设置别名[l] = g.
- 设定p g:=(p g + p l)-1.(这是一个更加数字稳定的选项.)
- 如果p g <1,则将g添加到Small.
- 否则(p 克 ≥1 ),添加克到大.
- 虽然大不是空的:
- 从Large中删除第一个元素; 叫它g.
- 设置问题[g] = 1.
- 虽然Small不是空的:这只能由于数值不稳定而成为可能.
- 从Small中删除第一个元素; 叫它l.
- 设置Prob [l] = 1.
代:
- 从n侧模具生成合理的模具辊; 打电话给我.
- 用概率Prob [i]翻转出现在头上的有偏见的硬币.
- 如果硬币出现"头",则返回i.
- 否则,返回Alias [i].
红宝石中的一个例子
#each element is associated with its probability
a = {1 => 0.25 ,2 => 0.5 ,3 => 0.2, 4 => 0.05}
#at some point, convert to ccumulative probability
acc = 0
a.each { |e,w| a[e] = acc+=w }
#to select an element, pick a random between 0 and 1 and find the first
#cummulative probability that's greater than the random number
r = rand
selected = a.find{ |e,w| w>r }
p selected[0]
Run Code Online (Sandbox Code Playgroud)
这可以在每个样品的O(1)预期时间内完成,如下所述.
计算每个元素i的CDF F(i)是小于或等于i的概率之和.
将元素i的范围r(i)定义为区间[F(i-1),F(i)].
对于每个区间[(i-1)/ n,i/n],创建一个桶,其中包含范围与区间重叠的元素列表.只要您非常小心,整个阵列总共花费O(n)时间.
当您随机抽样数组时,您只需计算随机数所在的存储桶,并与列表中的每个元素进行比较,直到找到包含该数据的时间间隔为止.
样本的成本是O(随机选择列表的预期长度)<= 2.
另一个Ruby示例:
def weighted_rand(weights = {})
raise 'Probabilities must sum up to 1' unless weights.values.inject(&:+) == 1.0
raise 'Probabilities must not be negative' unless weights.values.all? { |p| p >= 0 }
# Do more sanity checks depending on the amount of trust in the software component using this method
# E.g. don't allow duplicates, don't allow non-numeric values, etc.
# Ignore elements with probability 0
weights = weights.reject { |k, v| v == 0.0 } # e.g. => {"a"=>0.4, "b"=>0.4, "c"=>0.2}
# Accumulate probabilities and map them to a value
u = 0.0
ranges = weights.map { |v, p| [u += p, v] } # e.g. => [[0.4, "a"], [0.8, "b"], [1.0, "c"]]
# Generate a (pseudo-)random floating point number between 0.0(included) and 1.0(excluded)
u = rand # e.g. => 0.4651073966724186
# Find the first value that has an accumulated probability greater than the random number u
ranges.find { |p, v| p > u }.last # e.g. => "b"
end
Run Code Online (Sandbox Code Playgroud)
如何使用:
weights = {'a' => 0.4, 'b' => 0.4, 'c' => 0.2, 'd' => 0.0}
weighted_rand weights
Run Code Online (Sandbox Code Playgroud)
期待什么:
d = 1000.times.map{ weighted_rand weights }
d.count('a') # 396
d.count('b') # 406
d.count('c') # 198
Run Code Online (Sandbox Code Playgroud)
这是我在生产中使用的 PHP 代码:
/**
* @return \App\Models\CdnServer
*/
protected function selectWeightedServer(Collection $servers)
{
if ($servers->count() == 1) {
return $servers->first();
}
$totalWeight = 0;
foreach ($servers as $server) {
$totalWeight += $server->getWeight();
}
// Select a random server using weighted choice
$randWeight = mt_rand(1, $totalWeight);
$accWeight = 0;
foreach ($servers as $server) {
$accWeight += $server->getWeight();
if ($accWeight >= $randWeight) {
return $server;
}
}
}
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
40154 次 |
| 最近记录: |