Fastest way to convert a list of indices to 2D numpy array of ones

Question

Fastest way to convert a list of indices to 2D numpy array of ones

Spc*_*ond 7 python arrays performance numpy

I have a list of indices

a = [
  [1,2,4],
  [0,2,3],
  [1,3,4],
  [0,2]]

Run Code Online (Sandbox Code Playgroud)

What's the fastest way to convert this to a numpy array of ones, where each index shows the position where 1 would occur?

I.e. what I want is:

output = array([
  [0,1,1,0,1],
  [1,0,1,1,0],
  [0,1,0,1,1],
  [1,0,1,0,0]])

Run Code Online (Sandbox Code Playgroud)

I know the max size of the array beforehand. I know I could loop through each list and insert a 1 into at each index position, but is there a faster/vectorized way to do this?

My use case could have thousands of rows/cols and I need to do this thousands of times, so the faster the better.

Answer 1

Pau*_*zer 10

How about this:

ncol = 5
nrow = len(a)
out = np.zeros((nrow, ncol), int)
out[np.arange(nrow).repeat([*map(len,a)]), np.concatenate(a)] = 1
out
# array([[0, 1, 1, 0, 1],
#        [1, 0, 1, 1, 0],
#        [0, 1, 0, 1, 1],
#        [1, 0, 1, 0, 0]])

Run Code Online (Sandbox Code Playgroud)

以下是1000x1000二进制数组的计时，请注意，我使用了上面的优化版本，请参见pp下面的函数：

pp 21.717635259992676 ms
ts 37.10938713003998 ms
u9 37.32933565042913 ms

Run Code Online (Sandbox Code Playgroud)

产生计时的代码：

import itertools as it
import numpy as np

def make_data(n,m):
    I,J = np.where(np.random.random((n,m))<np.random.random((n,1)))
    return [*map(np.ndarray.tolist, np.split(J, I.searchsorted(np.arange(1,n))))]

def pp():
    sz = np.fromiter(map(len,a),int,nrow)
    out = np.zeros((nrow,ncol),int)
    out[np.arange(nrow).repeat(sz),np.fromiter(it.chain.from_iterable(a),int,sz.sum())] = 1
    return out

def ts():
    out = np.zeros((nrow,ncol),int)
    for i, ix in enumerate(a):
        out[i][ix] = 1
    return out

def u9():
    out = np.zeros((nrow,ncol),int)
    for i, (x, y) in enumerate(zip(a, out)):
        y[x] = 1
        out[i] = y
    return out

nrow,ncol = 1000,1000
a = make_data(nrow,ncol)

from timeit import timeit
assert (pp()==ts()).all()
assert (pp()==u9()).all()

print("pp", timeit(pp,number=100)*10, "ms")
print("ts", timeit(ts,number=100)*10, "ms")
print("u9", timeit(u9,number=100)*10, "ms")

Run Code Online (Sandbox Code Playgroud)

从外观上看，使用几个`numpy`函数和`map`的速度也会变慢（如果不尝试就无法确认） (2认同)
@TeshanShanukaJ是否意味着您的解决方案更快？您有时间来备份吗？性能取决于数据，而IMO可以很好地扩展（这也是我赞成的原因）。 (2认同)

Answer 2

Tes*_*a J 6

这可能不是最快的方法。您将需要使用大型数组比较这些答案的执行时间，以找出最快的方法。这是我的解决方案

output = np.zeros((4,5))
for i, ix in enumerate(a):
    output[i][ix] = 1

# output -> 
#   array([[0, 1, 1, 0, 1],
#   [1, 0, 1, 1, 0],
#   [0, 1, 0, 1, 1],
#   [1, 0, 1, 0, 0]])

Run Code Online (Sandbox Code Playgroud)

归档时间：	6 年，10 月前
查看次数：	536 次
最近记录：	6 年，10 月前