如何创建没有重复的随机数列表?

iCo*_*unk 89 python random

我试过用random.randint(0, 100),但有些数字是一样的.是否有方法/模块来创建列表唯一的随机数?

def getScores():
    # open files to read and write
    f1 = open("page.txt", "r");
    p1 = open("pgRes.txt", "a");

    gScores = [];
    bScores = [];
    yScores = [];

    # run 50 tests of 40 random queries to implement "bootstrapping" method 
    for i in range(50):
        # get 40 random queries from the 50
        lines = random.sample(f1.readlines(), 40);
Run Code Online (Sandbox Code Playgroud)

Gre*_*ill 148

这将返回从0到99范围内选择的10个数字的列表,没有重复.

import random
random.sample(range(100), 10)
Run Code Online (Sandbox Code Playgroud)

参考您的特定代码示例,您可能希望从文件中读取所有行一次,然后从内存中保存的列表中选择随机行.例如:

all_lines = f1.readlines()
for i in range(50):
    lines = random.sample(all_lines, 40)
Run Code Online (Sandbox Code Playgroud)

这样,您只需要在循环之前实际读取一次文件.这样做要比回寻文件的开头并f1.readlines()为每次循环迭代再次调用要高效得多.

  • “numpy”而不是“random”似乎更快。`将 numpy 导入为 np;np.random.permutation(100)[:10]` 还生成从 0 到 99 中选择的 10 个数字,没有重复。在 IPython 中进行基准测试,“%timeit random.sample(range(1000), 100)”为 103 µs ± 513 ns,“%timeit np.random.permutation(1000)[:100]”为 17 µs ± 1.24 µs 。 (4认同)
  • @wjandrea 是的,我知道 Python 3 `range` 会生成一个生成器。当我发布该评论时,如果您尝试“sample = random.sample(range(1000000000000000000), 10)”,您可以看到进程的内存在提取样本之前尝试具体化范围时增长。现在使用 Python 3.10 进行检查,似乎实现方式有所不同(没有内存问题),所以我之前的评论现在已经无关紧要了。不过,LCG 解决方案仍然是一个有趣的学习练习! (3认同)
  • 这种技术浪费了内存,特别是对于大样本。我在下面发布了代码,以提供更多的内存和高效的计算解决方案,该解决方案使用了线性同余生成器。 (2认同)

Ric*_*llo 13

您可以使用随机模块中的shuffle函数,如下所示:

import random

my_list = list(xrange(1,100)) # list of integers from 1 to 99
                              # adjust this boundaries to fit your needs
random.shuffle(my_list)
print my_list # <- List of unique random numbers
Run Code Online (Sandbox Code Playgroud)

请注意,shuffle方法不会像人们预期的那样返回任何列表,它只会随机引用传递的列表.


ben*_*ben 9

您可以首先从创建号码列表ab,这里ab分别在列表中的最小和最大的数字,然后将它洗费雪耶茨算法或使用Python的random.shuffle方法.

  • 生成完整的索引列表会浪费内存,尤其是对于大样本。我在下面发布了使用线性同余生成器的更多内存和计算效率解决方案的代码。 (2认同)

ins*_*get 8

这个答案中提出的解决方案是有效的,但如果样本量很小,那么它可能会成为记忆问题,但人口却很庞大(例如random.sample(insanelyLargeNumber, 10)).

要解决这个问题,我会这样做:

answer = set()
sampleSize = 10
answerSize = 0

while answerSize < sampleSize:
    r = random.randint(0,100)
    if r not in answer:
        answerSize += 1
        answer.add(r)

# answer now contains 10 unique, random integers from 0.. 100
Run Code Online (Sandbox Code Playgroud)


Tho*_*Lux 7

线性同余伪随机数发生器

O(1) 内存

O(k) 操作

这个问题可以用一个简单的Linear Congruential Generator来解决。这需要恒定的内存开销(8 个整数)和最多 2*(序列长度)计算。

所有其他解决方案使用更多内存和更多计算!如果你只需要几个随机序列,这种方法会便宜很多。对于 size 范围N,如果您想按N唯一k序列或更多顺序生成,我建议使用内置方法接受已接受的解决方案,random.sample(range(N),k)因为这在 python 中针对速度进行了优化

代码

# Return a randomized "range" using a Linear Congruential Generator
# to produce the number sequence. Parameters are the same as for 
# python builtin "range".
#   Memory  -- storage for 8 integers, regardless of parameters.
#   Compute -- at most 2*"maximum" steps required to generate sequence.
#
def random_range(start, stop=None, step=None):
    import random, math
    # Set a default values the same way "range" does.
    if (stop == None): start, stop = 0, start
    if (step == None): step = 1
    # Use a mapping to convert a standard range into the desired range.
    mapping = lambda i: (i*step) + start
    # Compute the number of numbers in this range.
    maximum = (stop - start) // step
    # Seed range with a random integer.
    value = random.randint(0,maximum)
    # 
    # Construct an offset, multiplier, and modulus for a linear
    # congruential generator. These generators are cyclic and
    # non-repeating when they maintain the properties:
    # 
    #   1) "modulus" and "offset" are relatively prime.
    #   2) ["multiplier" - 1] is divisible by all prime factors of "modulus".
    #   3) ["multiplier" - 1] is divisible by 4 if "modulus" is divisible by 4.
    # 
    offset = random.randint(0,maximum) * 2 + 1      # Pick a random odd-valued offset.
    multiplier = 4*(maximum//4) + 1                 # Pick a multiplier 1 greater than a multiple of 4.
    modulus = int(2**math.ceil(math.log2(maximum))) # Pick a modulus just big enough to generate all numbers (power of 2).
    # Track how many random numbers have been returned.
    found = 0
    while found < maximum:
        # If this is a valid value, yield it in generator fashion.
        if value < maximum:
            found += 1
            yield mapping(value)
        # Calculate the next value in the sequence.
        value = (value*multiplier + offset) % modulus
Run Code Online (Sandbox Code Playgroud)

用法

此函数“random_range”的用法与任何生成器(如“range”)的用法相同。一个例子:

# Show off random range.
print()
for v in range(3,6):
    v = 2**v
    l = list(random_range(v))
    print("Need",v,"found",len(set(l)),"(min,max)",(min(l),max(l)))
    print("",l)
    print()
Run Code Online (Sandbox Code Playgroud)

样本结果

Required 8 cycles to generate a sequence of 8 values.
Need 8 found 8 (min,max) (0, 7)
 [1, 0, 7, 6, 5, 4, 3, 2]

Required 16 cycles to generate a sequence of 9 values.
Need 9 found 9 (min,max) (0, 8)
 [3, 5, 8, 7, 2, 6, 0, 1, 4]

Required 16 cycles to generate a sequence of 16 values.
Need 16 found 16 (min,max) (0, 15)
 [5, 14, 11, 8, 3, 2, 13, 1, 0, 6, 9, 4, 7, 12, 10, 15]

Required 32 cycles to generate a sequence of 17 values.
Need 17 found 17 (min,max) (0, 16)
 [12, 6, 16, 15, 10, 3, 14, 5, 11, 13, 0, 1, 4, 8, 7, 2, ...]

Required 32 cycles to generate a sequence of 32 values.
Need 32 found 32 (min,max) (0, 31)
 [19, 15, 1, 6, 10, 7, 0, 28, 23, 24, 31, 17, 22, 20, 9, ...]

Required 64 cycles to generate a sequence of 33 values.
Need 33 found 33 (min,max) (0, 32)
 [11, 13, 0, 8, 2, 9, 27, 6, 29, 16, 15, 10, 3, 14, 5, 24, ...]
Run Code Online (Sandbox Code Playgroud)


Mit*_*eat 5

如果从 1 到 N 的 N 个数字的列表是随机生成的,那么是的,有些数字可能会重复。

如果您想要以随机顺序从 1 到 N 的数字列表,请使用从 1 到 N 的整数填充数组,然后使用Fisher-Yates shuffle或 Python 的random.shuffle().


Han*_*man 5

如果您需要对非常大的数字进行采样,则不能使用range

random.sample(range(10000000000000000000000000000000), 10)
Run Code Online (Sandbox Code Playgroud)

因为它抛出:

OverflowError: Python int too large to convert to C ssize_t
Run Code Online (Sandbox Code Playgroud)

另外,如果random.sample由于范围太小而无法生产您想要的商品数量

random.sample(range(2), 1000)
Run Code Online (Sandbox Code Playgroud)

它抛出:

ValueError: Sample larger than population
Run Code Online (Sandbox Code Playgroud)

这个函数解决了这两个问题:

import random

def random_sample(count, start, stop, step=1):
    def gen_random():
        while True:
            yield random.randrange(start, stop, step)

    def gen_n_unique(source, n):
        seen = set()
        seenadd = seen.add
        for i in (i for i in source() if i not in seen and not seenadd(i)):
            yield i
            if len(seen) == n:
                break

    return [i for i in gen_n_unique(gen_random,
                                    min(count, int(abs(stop - start) / abs(step))))]
Run Code Online (Sandbox Code Playgroud)

非常大的数字的用法:

print('\n'.join(map(str, random_sample(10, 2, 10000000000000000000000000000000))))
Run Code Online (Sandbox Code Playgroud)

结果示例:

7822019936001013053229712669368
6289033704329783896566642145909
2473484300603494430244265004275
5842266362922067540967510912174
6775107889200427514968714189847
9674137095837778645652621150351
9969632214348349234653730196586
1397846105816635294077965449171
3911263633583030536971422042360
9864578596169364050929858013943
Run Code Online (Sandbox Code Playgroud)

范围小于请求的项目数时的用法:

print(', '.join(map(str, random_sample(100000, 0, 3))))
Run Code Online (Sandbox Code Playgroud)

结果示例:

2, 0, 1
Run Code Online (Sandbox Code Playgroud)

它还适用于负范围和步长:

print(', '.join(map(str, random_sample(10, 10, -10, -2))))
print(', '.join(map(str, random_sample(10, 5, -5, -2))))
Run Code Online (Sandbox Code Playgroud)

结果示例:

2, -8, 6, -2, -4, 0, 4, 10, -6, 8
-3, 1, 5, -1, 3
Run Code Online (Sandbox Code Playgroud)