Nic*_*s R 7 python random algorithm numpy python-3.x
我有一个问题,我需要创建m 个大小为n的样本而不进行替换。此外,该样本必须保留总体向量的原始顺序。所有这一切都超级快。
Population = [50, 30, 12, 24, 420, 243, 173, 194, 123, 43, 21, 64, 34...]
300 samples of a combination of 3
[[24, 21, 34], [50, 194, 21], [12, 173, 64], [30, 173, 194].... [12, 243, 34]]
Run Code Online (Sandbox Code Playgroud)
这些样本必须是独立的,在我的例子中,我需要保留原始总体数组的顺序。有多个可能的答案,但它们都不是很快,它们都是我的代码的瓶颈。我使用 NumPy 来生成随机数。
一些最有前途的方法如下:
gen = np.random.default_rng()
def random_combination(population, sample, number = 3):
with_replacement_samples = gen.choice(len(population), size=(sample, number))
pairs = np.sort(with_replacement_samples)
positions= positions[pairs]
for i in positions:
if i[0] == i[2] or i[0] == i[1] or i[1]== i[2]:
continue #I would need to generate new sample each time ... #if is expensive
yield I
Run Code Online (Sandbox Code Playgroud)
def random_combination4(posiciones, sample, number = 3):
pair = np.argpartition(gen.random((sample, len(posiciones))), number - 1, axis=-1)[:, :number]
pair = np.sort(pair)
for i in posiciones[pair]:
yield I
Run Code Online (Sandbox Code Playgroud)
def random_combination(population, sample, number = 3, probabilities = None):
if probabilities is None:
replicated_probabilities = np.tile( np.full(shape=num_elements,fill_value=1/num_elements), (num_samples, 1))
else:
replicated_probabilities = np.tile(probabilities, (num_samples, 1))
# replicate probabilities as many times as `num_samples`
# get random shifting numbers & scale them correctly
random_shifts = gen.random(replicated_probabilities.shape)
random_shifts /= random_shifts.sum(axis=1)[:, np.newaxis]
# shift by numbers & find largest (by finding the smallest of the negative)
shifted_probabilities = random_shifts - replicated_probabilities
combinations = np.sort( np.argpartition(shifted_probabilities, sample_size, axis=1)[:, :sample_size])
combinations = np.sort(combinations)
for i in combinations:
yield population[I]
Run Code Online (Sandbox Code Playgroud)
名词 最后一种方法是使用 for 但这非常昂贵
def random_combination2(population, sample, number = 3):
for i in range(sample):
pair = np.sort( gen.choice(len(population), size = number, replace = False))
yield population[pair[0]], population[pair[1]], population[pair[2]]
Run Code Online (Sandbox Code Playgroud)
不确定这是否比您的版本更快,但也许您可以尝试一下:
from random import shuffle
import numpy as np
#import pandas as pd # activate this line, if you want to use pandas
# Just create a fake-population with randn,
# you can just skip this line and set population
# to your data. Just convert it to a numpy array
# in case it is not stored in one.
# In case it consists of columns with different types,
# you can also use a pandas approach, which only differs in three lines
population= np.random.randn(10000, 1)
#population= pd.DataFrame(population) # activate this line if you want to try out pandas
indexes_orig= list(range(population.shape[0]))
shuffle(indexes_orig)
indexes= indexes_orig
m= 20 # m samples
n= 200 # of size n each
samples= list()
for i in range(m):
sample_indexes= indexes[:n]
sample_indexes.sort()
indexes= indexes[n:]
samples.append(population[sample_indexes, :])
#samples.append(population.iloc[sample_indexes, :]) # uncomment this instead of the line above, if you want to use pandas, it assumes you use a 0 based index without gaps
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
950 次 |
| 最近记录: |