NumPy:如何快速标准化许多载体?

Eri*_*got 20 python numpy vector normalization

在NumPy中,如何将矢量列表优雅地规范化?

下面是它的例子工作:

from numpy import *

vectors = array([arange(10), arange(10)])  # All x's, then all y's
norms = apply_along_axis(linalg.norm, 0, vectors)

# Now, what I was expecting would work:
print vectors.T / norms  # vectors.T has 10 elements, as does norms, but this does not work
Run Code Online (Sandbox Code Playgroud)

最后一个操作产生"形状不匹配:对象不能广播到单个形状".

如何vectors使用NumPy优雅地完成2D矢量的归一化?

编辑:为什么上面的内容在添加维度时norms不起作用(按照下面的答案)?

Geo*_*off 26

计算幅度

我遇到了这个问题,并对你的规范化方法感到好奇.我使用不同的方法来计算幅度.注意:我通常还会计算最后一个索引的规范(在这种情况下是行,而不是列).

magnitudes = np.sqrt((vectors ** 2).sum(-1))[..., np.newaxis]
Run Code Online (Sandbox Code Playgroud)

但是,通常我会像这样标准化:

vectors /= np.sqrt((vectors ** 2).sum(-1))[..., np.newaxis]
Run Code Online (Sandbox Code Playgroud)

时间比较

我进行了一次测试以比较时间,发现我的方法相当快,但Freddie Witherdon的建议更快.

import numpy as np    
vectors = np.random.rand(100, 25)

# OP's
%timeit np.apply_along_axis(np.linalg.norm, 1, vectors)
# Output: 100 loops, best of 3: 2.39 ms per loop

# Mine
%timeit np.sqrt((vectors ** 2).sum(-1))[..., np.newaxis]
# Output: 10000 loops, best of 3: 13.8 us per loop

# Freddie's (from comment below)
%timeit np.sqrt(np.einsum('...i,...i', vectors, vectors))
# Output: 10000 loops, best of 3: 6.45 us per loop
Run Code Online (Sandbox Code Playgroud)

要小心的是,因为这StackOverflow的答案笔记,还有一些安全检查不发生einsum,所以你应该确保dtypevectors是足以存储大小足够准确的平方.

  • 我发现`np.sqrt(np.einsum('... i,... i',向量,向量))`比上面给出的方法1快〜4倍. (2认同)

Oli*_*ier 14

好吧,除非我错过了什么,这确实有效:

vectors / norms
Run Code Online (Sandbox Code Playgroud)

你的建议中的问题是广播规则.

vectors  # shape 2, 10
norms  # shape 10
Run Code Online (Sandbox Code Playgroud)

形状长度不一样!所以规则是首先将左边的小形状延伸一个:

norms  # shape 1,10
Run Code Online (Sandbox Code Playgroud)

你可以通过调用手动完成:

vectors / norms.reshape(1,-1)  # same as vectors/norms
Run Code Online (Sandbox Code Playgroud)

如果要进行计算vectors.T/norms,则必须手动进行重新整形,如下所示:

vectors.T / norms.reshape(-1,1)  # this works
Run Code Online (Sandbox Code Playgroud)


Eri*_*got 13

好吧:NumPy的阵列形状广播在阵列形状的左边增加了尺寸,而不是右边.但是,可以指示NumPy在norms数组右侧添加维度:

print vectors.T / norms[:, newaxis]
Run Code Online (Sandbox Code Playgroud)

确实有效!

  • 只是注意,我使用`规范[...,np.newaxis]`以防矩阵不仅仅是2D.它也适用于3D(或更多)张量. (3认同)

小智 11

scikit中已经有一个函数学习:

import sklearn.preprocessing as preprocessing
norm =preprocessing.normalize(m, norm='l2')*
Run Code Online (Sandbox Code Playgroud)

更多信息:

http://scikit-learn.org/stable/modules/preprocessing.html


Fno*_*ord 5

我首选的向量归一化方法是使用 numpy 的 inner1d 来计算它们的大小。与inner1d相比,到目前为止建议的内容如下

import numpy as np
from numpy.core.umath_tests import inner1d
COUNT = 10**6 # 1 million points

points = np.random.random_sample((COUNT,3,))
A      = np.sqrt(np.einsum('...i,...i', points, points))
B      = np.apply_along_axis(np.linalg.norm, 1, points)   
C      = np.sqrt((points ** 2).sum(-1))
D      = np.sqrt((points*points).sum(axis=1))
E      = np.sqrt(inner1d(points,points))

print [np.allclose(E,x) for x in [A,B,C,D]] # [True, True, True, True]
Run Code Online (Sandbox Code Playgroud)

使用 cProfile 测试性能:

import cProfile
cProfile.run("np.sqrt(np.einsum('...i,...i', points, points))**0.5") # 3 function calls in 0.013 seconds
cProfile.run('np.apply_along_axis(np.linalg.norm, 1, points)')       # 9000018 function calls in 10.977 seconds
cProfile.run('np.sqrt((points ** 2).sum(-1))')                       # 5 function calls in 0.028 seconds
cProfile.run('np.sqrt((points*points).sum(axis=1))')                 # 5 function calls in 0.027 seconds
cProfile.run('np.sqrt(inner1d(points,points))')                      # 2 function calls in 0.009 seconds
Run Code Online (Sandbox Code Playgroud)

inner1d 计算幅度的速度比 einsum 快。所以使用inner1d来规范化:

n = points/np.sqrt(inner1d(points,points))[:,None]
cProfile.run('points/np.sqrt(inner1d(points,points))[:,None]') # 2 function calls in 0.026 seconds
Run Code Online (Sandbox Code Playgroud)

针对 scikit 进行测试:

import sklearn.preprocessing as preprocessing
n_ = preprocessing.normalize(points, norm='l2')
cProfile.run("preprocessing.normalize(points, norm='l2')") # 47 function calls in 0.047 seconds
np.allclose(n,n_) # True
Run Code Online (Sandbox Code Playgroud)

结论:使用inner1d似乎是最好的选择