Python shuffle数组,非零很少(非常sparsey)

Bil*_*Boy 4 python arrays numpy

我有一个非常大(长度约1.5亿)的numpy数组,其非零值非常少(约99.9%的数组为0).我想要洗牌,但是洗牌很慢(大约需要10秒,这是不可接受的,因为我正在进行蒙特卡罗模拟).有没有办法以考虑到我的数组主要由0组成的事实来改变它?

我正在考虑改变我的正值,然后将它随机插入一个完整的数组0,但我找不到一个numpy函数.

Div*_*kar 5

方法#1:这是一种方法 -

def shuffle_sparse_arr(a):
    out = np.zeros_like(a)
    mask = a!=0
    n = np.count_nonzero(mask)
    idx = np.random.choice(a.size, n, replace=0)
    out[idx] = a[mask]
    return out
Run Code Online (Sandbox Code Playgroud)

方法#2: Hackish方式 -

def shuffle_sparse_arr_hackish(a):
    out = np.zeros_like(a)
    mask = a!=0
    n = np.count_nonzero(mask)
    idx = np.unique((a.size*np.random.rand(2*n)).astype(int))[:n]
    while idx.size<n:
        idx = np.unique((a.size*np.random.rand(2*n)).astype(int))[:n]
    np.random.shuffle(idx)
    out[idx] = a[mask]
    return out
Run Code Online (Sandbox Code Playgroud)

样品运行 -

In [269]: # Setup input array
     ...: a = np.zeros((20),dtype=int)
     ...: sidx = np.random.choice(a.size, 6, replace=0)
     ...: a[sidx] = [5,8,4,1,7,3]
     ...: 

In [270]: a
Out[270]: array([4, 0, 0, 8, 0, 0, 5, 0, 0, 0, 0, 7, 0, 0, 1, 0, 0, 0, 0, 3])

In [271]: shuffle_sparse_arr(a)
Out[271]: array([0, 5, 0, 0, 0, 0, 1, 0, 4, 0, 0, 0, 0, 0, 0, 7, 3, 8, 0, 0])

In [272]: shuffle_sparse_arr_hackish(a)
Out[272]: array([3, 1, 5, 0, 4, 0, 7, 0, 0, 0, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0])
Run Code Online (Sandbox Code Playgroud)

运行时测试 -

In [288]: # Setup input array with 15 million and 99.9% zeros
     ...: a = np.zeros((15000000),dtype=int)
     ...: 
     ...: # Set 100-99.9% as random non-zeros
     ...: n = int(a.size*((100-99.9)/100)) 
     ...: 
     ...: set_idx = np.random.choice(a.size, n , replace=0)
     ...: nums = np.random.choice(a.size, n , replace=0)
     ...: a[set_idx] = nums
     ...: 

In [289]: %timeit shuffle_sparse_arr(a)
1 loops, best of 3: 647 ms per loop

In [290]: %timeit shuffle_sparse_arr_hackish(a)
10 loops, best of 3: 29.1 ms per loop

In [291]: %timeit np.random.shuffle(a)
1 loops, best of 3: 606 ms per loop
Run Code Online (Sandbox Code Playgroud)