shx*_*hx2 9 python numpy nan blas
我注意到涉及s和0 numpy.dot时nan的行为不一致.
任何人都可以理解它吗?这是一个错误吗?这是特定的dot功能吗?
我正在使用numpy v1.6.1,64bit,在linux上运行(也在v1.6.2上测试过).我还在windows 32bit上的v1.8.0上进行了测试(所以我不知道这些差异是由于版本,操作系统还是拱门).
from numpy import *
0*nan, nan*0
=> (nan, nan) # makes sense
#1
a = array([[0]])
b = array([[nan]])
dot(a, b)
=> array([[ nan]]) # OK
#2 -- adding a value to b. the first value in the result is
# not expected to be affected.
a = array([[0]])
b = array([[nan, 1]])
dot(a, b)
=> array([[ 0., 0.]]) # EXPECTED : array([[ nan, 0.]])
# (also happens in 1.6.2 and 1.8.0)
# Also, as @Bill noted, a*b works as expected, but not dot(a,b)
#3 -- changing a from 0 to 1, the first value in the result is
# not expected to be affected.
a = array([[1]])
b = array([[nan, 1]])
dot(a, b)
=> array([[ nan, 1.]]) # OK
#4 -- changing shape of a, changes nan in result
a = array([[0],[0]])
b = array([[ nan, 1.]])
dot(a, b)
=> array([[ 0., 0.], [ 0., 0.]]) # EXPECTED : array([[ nan, 0.], [ nan, 0.]])
# (works as expected in 1.6.2 and 1.8.0)
Run Code Online (Sandbox Code Playgroud)
案例#4似乎在v1.6.2和v1.8.0中正常工作,但不是案例#2 ......
编辑:@seberg指出这是一个blas问题,所以这里是关于通过运行找到的blas安装的信息from numpy.distutils.system_info import get_info; get_info('blas_opt'):
1.6.1 linux 64bit
/usr/lib/python2.7/dist-packages/numpy/distutils/system_info.py:1423: UserWarning:
Atlas (http://math-atlas.sourceforge.net/) libraries not found.
Directories to search for the libraries can be specified in the
numpy/distutils/site.cfg file (section [atlas]) or by setting
the ATLAS environment variable.
warnings.warn(AtlasNotFoundError.__doc__)
{'libraries': ['blas'], 'library_dirs': ['/usr/lib'], 'language': 'f77', 'define_macros': [('NO_ATLAS_INFO', 1)]}
1.8.0 windows 32bit (anaconda)
c:\Anaconda\Lib\site-packages\numpy\distutils\system_info.py:1534: UserWarning:
Blas (http://www.netlib.org/blas/) sources not found.
Directories to search for the sources can be specified in the
numpy/distutils/site.cfg file (section [blas_src]) or by setting
the BLAS_SRC environment variable.
warnings.warn(BlasSrcNotFoundError.__doc__)
{}
Run Code Online (Sandbox Code Playgroud)
(我个人不知道该怎么做)
我认为,正如 Seberg 所建议的,这是所使用的 BLAS 库的问题。如果您查看 numpy.dot 的实现方式(此处)和此处(此处),您会发现针对双精度矩阵乘矩阵情况对 cblas_dgemm() 的调用。
这个 C 程序重现了您的一些示例,在使用“普通”BLAS 时给出相同的输出,在使用 ATLAS 时给出正确的答案。
#include <stdio.h>
#include <math.h>
#include "cblas.h"
void onebyone(double a11, double b11, double expectc11)
{
enum CBLAS_ORDER order=CblasRowMajor;
enum CBLAS_TRANSPOSE transA=CblasNoTrans;
enum CBLAS_TRANSPOSE transB=CblasNoTrans;
int M=1;
int N=1;
int K=1;
double alpha=1.0;
double A[1]={a11};
int lda=1;
double B[1]={b11};
int ldb=1;
double beta=0.0;
double C[1];
int ldc=1;
cblas_dgemm(order, transA, transB,
M, N, K,
alpha,A,lda,
B, ldb,
beta, C, ldc);
printf("dot([ %.18g],[%.18g]) -> [%.18g]; expected [%.18g]\n",a11,b11,C[0],expectc11);
}
void onebytwo(double a11, double b11, double b12,
double expectc11, double expectc12)
{
enum CBLAS_ORDER order=CblasRowMajor;
enum CBLAS_TRANSPOSE transA=CblasNoTrans;
enum CBLAS_TRANSPOSE transB=CblasNoTrans;
int M=1;
int N=2;
int K=1;
double alpha=1.0;
double A[]={a11};
int lda=1;
double B[2]={b11,b12};
int ldb=2;
double beta=0.0;
double C[2];
int ldc=2;
cblas_dgemm(order, transA, transB,
M, N, K,
alpha,A,lda,
B, ldb,
beta, C, ldc);
printf("dot([ %.18g],[%.18g, %.18g]) -> [%.18g, %.18g]; expected [%.18g, %.18g]\n",
a11,b11,b12,C[0],C[1],expectc11,expectc12);
}
int
main()
{
onebyone(0, 0, 0);
onebyone(2, 3, 6);
onebyone(NAN, 0, NAN);
onebyone(0, NAN, NAN);
onebytwo(0, 0,0, 0,0);
onebytwo(2, 3,5, 6,10);
onebytwo(0, NAN,0, NAN,0);
onebytwo(NAN, 0,0, NAN,NAN);
return 0;
}
Run Code Online (Sandbox Code Playgroud)
BLAS 输出:
dot([ 0],[0]) -> [0]; expected [0]
dot([ 2],[3]) -> [6]; expected [6]
dot([ nan],[0]) -> [nan]; expected [nan]
dot([ 0],[nan]) -> [0]; expected [nan]
dot([ 0],[0, 0]) -> [0, 0]; expected [0, 0]
dot([ 2],[3, 5]) -> [6, 10]; expected [6, 10]
dot([ 0],[nan, 0]) -> [0, 0]; expected [nan, 0]
dot([ nan],[0, 0]) -> [nan, nan]; expected [nan, nan]
Run Code Online (Sandbox Code Playgroud)
ATLAS 输出:
dot([ 0],[0]) -> [0]; expected [0]
dot([ 2],[3]) -> [6]; expected [6]
dot([ nan],[0]) -> [nan]; expected [nan]
dot([ 0],[nan]) -> [nan]; expected [nan]
dot([ 0],[0, 0]) -> [0, 0]; expected [0, 0]
dot([ 2],[3, 5]) -> [6, 10]; expected [6, 10]
dot([ 0],[nan, 0]) -> [nan, 0]; expected [nan, 0]
dot([ nan],[0, 0]) -> [nan, nan]; expected [nan, nan]
Run Code Online (Sandbox Code Playgroud)
当第一个操作数具有 NaN 时,BLAS 似乎具有预期行为,而当第一个操作数为零且第二个操作数具有 NaN 时,BLAS 似乎具有错误行为。
不管怎样,我不认为这个bug是在Numpy层;它在 BLAS 中。似乎可以通过使用 ATLAS 来解决。
以上是在 Ubuntu 14.04 上生成的,使用 Ubuntu 提供的 gcc、BLAS 和 ATLAS。