在python中没有替换的加权随机样本

Question

在python中没有替换的加权随机样本

我需要从群体中获得k大小的样本而无需替换,群体中的每个成员具有相关的权重(W).

Numpy的random.choices不会在没有替换的情况下执行此任务,random.sample将不会采用加权输入.

目前,这是我正在使用的:

P = np.zeros((1,Parent_number))
n=0
while n < Parent_number:
    draw = random.choices(population,weights=W,k=1)
    if draw not in P:
        P[0,n] = draw[0]
        n=n+1
P=np.asarray(sorted(P[0]))

Run Code Online (Sandbox Code Playgroud)

虽然这有效,但它需要在数组,列表和数组之间来回切换,因此不太理想.

我正在寻找最简单易懂的解决方案,因为此代码将与其他人共享.

Answer 1

Mir*_*ber 10

您可以使用np.random.choice具有replace=False如下:

np.random.choice(vec,size,replace=False, p=P)

Run Code Online (Sandbox Code Playgroud)

vec你的人口在哪里,P是权重向量.

例如:

import numpy as np
vec=[1,2,3]
P=[0.5,0.2,0.3]
np.random.choice(vec,size=2,replace=False, p=P)

Run Code Online (Sandbox Code Playgroud)

但如果不进行替换，下一次抽样的总体规模应该会减少。这里怎么会发生这种事？ (2认同)

Answer 2

Ray*_*ger 9

内置解决方案

正如 Miriam Farber 所建议的，您可以使用 numpy 的内置解决方案：

np.random.choice(vec,size,replace=False, p=P)

Run Code Online (Sandbox Code Playgroud)

纯 python 等价物

接下来的内容与numpy在内部所做的很接近。当然，它使用 numpy 数组和numpy.random.choices()：

from random import choices

def weighted_sample_without_replacement(population, weights, k=1):
    weights = list(weights)
    positions = range(len(population))
    indices = []
    while True:
        needed = k - len(indices)
        if not needed:
            break
        for i in choices(positions, weights, k=needed):
            if weights[i]:
                weights[i] = 0.0
                indices.append(i)
    return [population[i] for i in indices]

Run Code Online (Sandbox Code Playgroud)

相关问题：元素可以重复时的选择

这有时被称为骨灰盒问题。例如，给定一个有 10 个红球、4 个白球和 18 个绿球的瓮，选择 9 个球而不用替换。

要使用numpy执行此操作，请使用sample()从总人口数中生成唯一选择。然后，平分累积权重以获得人口指数。

import numpy as np
from random import sample

population = np.array(['red', 'blue', 'green'])
counts = np.array([10, 4, 18])
k = 9

cum_counts = np.add.accumulate(counts)
total = cum_counts[-1]
selections = sample(range(total), k=k)
indices = np.searchsorted(cum_counts, selections, side='right')
result = population[indices]

Run Code Online (Sandbox Code Playgroud)

要在没有 *numpy' 的情况下执行此操作，可以使用标准库中的bisect()和accumulate()实现相同的方法：

from random import sample
from bisect import bisect
from itertools import accumulate

population = ['red', 'blue', 'green']
weights = [10, 4, 18]
k = 9

cum_weights = list(accumulate(weights))
total = cum_weights.pop()
selections = sample(range(total), k=k)
indices = [bisect(cum_weights, s) for s in selections]
result = [population[i] for i in indices]

Run Code Online (Sandbox Code Playgroud)

归档时间：	8 年，6 月前
查看次数：	4173 次
最近记录：	8 年，6 月前