如何在 numpy 数组中按行随机分配值

Question

如何在 numpy 数组中按行随机分配值

我的 google-fu 失败了！我有一个 10x10 numpy 数组，初始化如下0：

arr2d = np.zeros((10,10))

Run Code Online (Sandbox Code Playgroud)

对于中的每一行arr2d，我想将 3 个随机列分配给1。我可以使用循环来完成此操作，如下所示：

for row in arr2d:
    rand_cols = np.random.randint(0,9,3)
    row[rand_cols] = 1

Run Code Online (Sandbox Code Playgroud)

输出：

array([[ 0.,  0.,  0.,  0.,  0.,  0.,  1.,  1.,  1.,  0.],
   [ 0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  1.,  0.],
   [ 0.,  0.,  1.,  0.,  1.,  1.,  0.,  0.,  0.,  0.],
   [ 0.,  0.,  0.,  0.,  1.,  1.,  1.,  0.,  0.,  0.],
   [ 1.,  0.,  0.,  1.,  1.,  0.,  0.,  0.,  0.,  0.],
   [ 1.,  0.,  1.,  1.,  0.,  0.,  0.,  0.,  0.,  0.],
   [ 0.,  1.,  0.,  0.,  0.,  0.,  1.,  0.,  1.,  0.],
   [ 0.,  0.,  1.,  0.,  1.,  0.,  0.,  0.,  1.,  0.],
   [ 1.,  0.,  0.,  0.,  0.,  0.,  1.,  1.,  0.,  0.],
   [ 0.,  1.,  0.,  0.,  1.,  0.,  0.,  1.,  0.,  0.]])

Run Code Online (Sandbox Code Playgroud)

有没有办法利用 numpy 或数组索引/切片以更 Pythonic/优雅的方式实现相同的结果（最好用 1 或 2 行代码）？

Answer 1

Div*_*kar 2

一旦您使用进行了arr2d初始化arr2d = np.zeros((10,10))，您就可以使用矢量化方法，如下所示two-liner-

# Generate random unique 3 column indices for 10 rows
idx = np.random.rand(10,10).argsort(1)[:,:3]

# Assign them into initialized array
arr2d[np.arange(10)[:,None],idx] = 1

Run Code Online (Sandbox Code Playgroud)

或者如果你喜欢这样的话，就可以把所有东西都抽筋——

arr2d[np.arange(10)[:,None],np.random.rand(10,10).argsort(1)[:,:3]] = 1

Run Code Online (Sandbox Code Playgroud)

样本运行 -

In [11]: arr2d = np.zeros((10,10))  # Initialize array

In [12]: idx = np.random.rand(10,10).argsort(1)[:,:3]

In [13]: arr2d[np.arange(10)[:,None],idx] = 1

In [14]: arr2d # Verify by manual inspection
Out[14]: 
array([[ 0.,  1.,  0.,  1.,  0.,  0.,  0.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  0.,  1.,  1.,  0.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  1.,  0.,  1.,  0.,  1.],
       [ 0.,  1.,  1.,  0.,  0.,  0.,  0.,  0.,  1.,  0.],
       [ 0.,  0.,  1.,  1.,  0.,  0.,  0.,  1.,  0.,  0.],
       [ 1.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  1.,  0.,  0.,  0.,  1.,  0.,  1.],
       [ 1.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  1.,  0.],
       [ 1.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  1.,  0.],
       [ 0.,  1.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  1.]])

In [15]: arr2d.sum(1) # Verify by counting ones in each row
Out[15]: array([ 3.,  3.,  3.,  3.,  3.,  3.,  3.,  3.,  3.,  3.])

Run Code Online (Sandbox Code Playgroud)

注意：如果您正在寻找性能，我建议使用np.argpartition中列出的基于方法this other post。

@zarak最初的想法来自这篇文章 - http://stackoverflow.com/a/29156976/3293881。这里列出了针对循环方法的加速：http://stackoverflow.com/a/31958263/3293881 (2认同)

归档时间：	9 年，4 月前
查看次数：	2717 次
最近记录：	9 年，4 月前