Ken*_*tov 3 python arrays numpy vectorization
我有一些二进制字符串s一样001010.我想将它转换为numpy数组a,其中a[i] = np.array([[1], [0]])if s[i] == '0'和to np.array([[0], [1]])否则.
所以我写了这样的代码:
a = np.empty([len(s), 2, 1])
for i, char in enumerate(s):
if char == '0':
a[i] = np.array([[1], [0]])
elif char == '1':
a[i] = np.array([[0], [1]])
Run Code Online (Sandbox Code Playgroud)
是否可以在没有for循环的情况下将其重写为矢量化形式?
我的预期输出看起来像:
array([[[1.],
[0.]],
[[1.],
[0.]],
[[0.],
[1.]],
[[1.],
[0.]],
[[0.],
[1.]],
[[1.],
[0.]]])
Run Code Online (Sandbox Code Playgroud)
方法#1:这里有一个NumPy char数组 -
sa = np.frombuffer(s,dtype='S1')
out = np.where(sa[:,None,None]=='0',[[1],[0]],[[0],[1]])
Run Code Online (Sandbox Code Playgroud)
方法#2:再一次作为一个班轮 -
((np.frombuffer(s,dtype=np.uint8)[:,None]==[48,49])[...,None]).astype(float)
Run Code Online (Sandbox Code Playgroud)
方法#3:最后一个完全专注于表现 -
a = np.zeros([len(s), 2, 1])
idx = np.frombuffer(s,dtype=np.uint8)-48
a[np.arange(len(idx)),idx] = 1
Run Code Online (Sandbox Code Playgroud)
在一串100000字符上的时间-
In [2]: np.random.seed(0)
In [3]: s = ''.join(map(str,np.random.randint(0,2,(100000)).tolist()))
# @yatu's soln
In [4]: %%timeit
...: a = np.array(list(s), dtype=int)
...: np.where(a==0, np.array([[1], [0]]), np.array([[0], [1]])).T[:,:,None]
10 loops, best of 3: 36.3 ms per loop
# App#1 from this post
In [5]: %%timeit
...: sa = np.frombuffer(s,dtype='S1')
...: out = np.where(sa[:,None,None]=='0',[[1],[0]],[[0],[1]])
100 loops, best of 3: 3.56 ms per loop
# App#2 from this post
In [6]: %timeit ((np.frombuffer(s,dtype=np.uint8)[:,None]==[48,49])[...,None]).astype(float)
1000 loops, best of 3: 1.81 ms per loop
# App#3 from this post
In [7]: %%timeit
...: a = np.zeros([len(s), 2, 1])
...: idx = np.frombuffer(s,dtype=np.uint8)-48
...: a[np.arange(len(idx)),idx] = 1
1000 loops, best of 3: 1.81 ms per loop
Run Code Online (Sandbox Code Playgroud)