Edm*_*mon 1 python algorithm sampling
我正在寻找Python中的高效函数,它可以在不替换的情况下进行样本选择,而是通过实际改变原始列表.也就是说,替代方案:
random.sample(population, k)
Run Code Online (Sandbox Code Playgroud)
在选择样本时从原始列表中删除元素.列表可以是数百万个项目,并且可能会对样本函数进行数十次后续调用.
理想情况下,我想做的事情如下:
sample_size_1 = 5
sample_size_2 = 200
sample_size_3 = 100
population = range(10000000)
sample_1 = select_sample(population, sample_size_1) #population is shrunk
sample_2 = select_sample(population, sample_size_2) #population is shrunk again
sample_3 = select_sample(population, sample_size_3) #and population is shrunk again
Run Code Online (Sandbox Code Playgroud)
在population每次调用select_sample之间有效缩小的位置.
我有一些代码,我可以在这里展示,但我希望已经可以获得的东西,或者比我的while循环更多的"pythonic".
一种简单的方法是对人口进行洗牌,使初始排序是随机的(如果它不是随机的).然后从最后获取元素并删除它们.
您可以通过切片population[-sample_size:]并使用它们删除它们来获取元素population[-sample_size:] = [].
import random
population = list(range(100))
# Shuffle population so the ordering is random.
random.shuffle(population)
for sample_size in [1, 5, 10]:
sample = population[-sample_size:]
population[-sample_size:] = []
print(sample)
# [79]
# [66, 89, 81, 0, 38]
# [18, 39, 90, 36, 11, 32, 63, 65, 72, 67]
Run Code Online (Sandbox Code Playgroud)
population.pop()如果您只想一次删除一个元素(例如,如果sample_size为1),您也可以使用.
这样做的功能就是(假设您的人口已经洗牌):
def select_sample(pop, size):
x = pop[-size:]
pop[-size:] = []
return x
Run Code Online (Sandbox Code Playgroud)