内存(行主要顺序):
[[A(0,0), A(0,1)]
[A(1,0), A(1,1)]]
has this memory layout:
[A(0,0), A(0,1), A(1,0), A(1,1)]
Run Code Online (Sandbox Code Playgroud)
我想在下列情况下算法的工作方式如此.
广播维度是最后一个维度:
[[0, 1, 2, 3] [[1]
x
[4, 5, 6, 7]] [10]]
A (2 by 4) B (2 by 1)
Iterate 0th dimensions of A and B simultaneously {
Iterate last dimension of A{
multiply;
}
}
Run Code Online (Sandbox Code Playgroud)
广播维度为第0维:
[[0, 1, 2, 3]
x [[1,10,100,1000]]
[4, 5, 6, 7]]
A (2 by 4) B (1 by 4)
Iterate 0th dimension of A{
Iterate 1st dimensions of A and B simultaneously{
multiply;
}
}
Run Code Online (Sandbox Code Playgroud)
题:
numpy如何知道乘法的哪个顺序是最好的.(按顺序读取内存比在整个地方读取内存要好.但numpy如何计算出来?)
如果数组有两个以上的维度,那会怎么样呢
关于发生了什么的第二个猜测:
#include <iostream>
int main(void){
const int nA = 12;
const int nB = 3;
int A[nA];
int B[nB];
for(int i = 0; i != nA; ++i) A[i] = i+1;
for(int i = 0; i != nB; ++i) B[i] = i+1;
//dimension
int dA[] = {2,3,2};
int dB[] = {1,3,1};
int* pA = A;
int* pB = B;
int* pA_end = A + nA;
//is it possible to make the compiler
//generate the iA and sA?
int iB = 0;
int iB_max = 2;
int sB[] = {1,0};
while(pA != pA_end){
std::cout << "*pA, *pB: " << *pA << ", " << *pB <<std::endl;
std::cout << "iB: " << iB <<std::endl;
*(pA) *= *(pB);
++pA;
pB += sB[iB];
++iB;
if (iB == iB_max) {iB = 0; pB = B;}
}
for(pA = A; pA != pA_end; ++pA){
std::cout << *(pA) << ", ";
}
std::cout << std::endl;
}
Run Code Online (Sandbox Code Playgroud)
要真正了解广播细节,您需要了解阵列形状和步幅.但是现在很多工作都是在c代码中实现的nditer.您可以在http://docs.scipy.org/doc/numpy/reference/arrays.nditer.html上阅读相关内容. np.nditer允许您在Python级别访问该工具,但是当它与cython您自己的c代码一起使用时,它的真正价值就来了.
np.lib.stride_tricks具有让您大踏步前进的功能.其中一个功能有助于可视化阵列如何一起广播.在实践中,工作已完成nditer,但此功能可能有助于理解操作:
In [629]: np.lib.stride_tricks.broadcast_arrays(np.arange(6).reshape(2,3),
np.array([[1],[2]]))
Out[629]:
[array([[0, 1, 2],
[3, 4, 5]]),
array([[1, 1, 1],
[2, 2, 2]])]
Run Code Online (Sandbox Code Playgroud)
请注意,实际上已复制第二个数组以匹配第一个形状.但复制是通过步幅技巧完成的,而不是使用实际副本.
In [631]: A,B=np.lib.stride_tricks.broadcast_arrays(np.arange(6).reshape(2,3),
np.array([[1],[2]]))
In [632]: A.shape
Out[632]: (2, 3)
In [633]: A.strides
Out[633]: (12, 4)
In [634]: B.shape
Out[634]: (2, 3)
In [635]: B.strides
Out[635]: (4, 0)
Run Code Online (Sandbox Code Playgroud)
正是这种(4,0)步伐才能在没有副本的情况下进行复制.
=================
使用python级别nditer,这是它在广播期间的作用.
In [1]: A=np.arange(6).reshape(2,3)
In [2]: B=np.array([[1],[2]])
Run Code Online (Sandbox Code Playgroud)
普通的nditer一次提供一组元素 http://docs.scipy.org/doc/numpy/reference/arrays.nditer.html#using-an-external-loop
In [5]: it =np.nditer((A,B))
In [6]: for a,b in it:
...: print(a,b)
0 1
1 1
2 1
3 2
4 2
5 2
Run Code Online (Sandbox Code Playgroud)
但是当我打开extenal_loop时,它会以块的形式迭代,这里是广播数组的各个行:
In [7]: it =np.nditer((A,B), flags=['external_loop'])
In [8]: for a,b in it:
...: print(a,b)
[0 1 2] [1 1 1]
[3 4 5] [2 2 2]
Run Code Online (Sandbox Code Playgroud)
通过更复杂的广播,external_loop仍然可以生成1d数组,允许简单的c样式迭代:
In [13]: A1=np.arange(24).reshape(3,2,4)
In [18]: it =np.nditer((A1,np.arange(3)[:,None,None]), flags=['external_loop'])
In [19]: while not it.finished:
...: print(it[:])
...: it.iternext()
...:
(array([0, 1, 2, 3, 4, 5, 6, 7]), array([0, 0, 0, 0, 0, 0, 0, 0]))
(array([ 8, 9, 10, 11, 12, 13, 14, 15]), array([1, 1, 1, 1, 1, 1, 1, 1]))
(array([16, 17, 18, 19, 20, 21, 22, 23]), array([2, 2, 2, 2, 2, 2, 2, 2]))
Run Code Online (Sandbox Code Playgroud)
注意,虽然A1是(3,2,4),但是nditer循环产生3步(第1轴),具有2*4个长度元素.
我在另一个cython/nditer问题中发现,第一种方法没有产生太大的速度提升,但第二种方法帮助了很多.在c或cython在external_loop情况下会做简单的低水平重复.
===============
如果我在第1轴和第3轴上广播,则迭代器需要2*3步(有效地展平前2轴,然后进行第3步):
In [20]: it =np.nditer((A1,np.arange(2)[None,:,None]), flags=['external_loop'])
In [21]: while not it.finished:
...: print(it[:])
...: it.iternext()
...:
(array([0, 1, 2, 3]), array([0, 0, 0, 0]))
(array([4, 5, 6, 7]), array([1, 1, 1, 1]))
(array([ 8, 9, 10, 11]), array([0, 0, 0, 0]))
(array([12, 13, 14, 15]), array([1, 1, 1, 1]))
(array([16, 17, 18, 19]), array([0, 0, 0, 0]))
(array([20, 21, 22, 23]), array([1, 1, 1, 1]))
Run Code Online (Sandbox Code Playgroud)
但是buffered,它会迭代一次,为我提供2个1d阵列:
In [22]: it =np.nditer((A1,np.arange(2)[None,:,None]), flags=['external_loop','buffered'])
In [23]: while not it.finished:
...: print(it[:])
...: it.iternext()
...:
(array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]),
array([0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1]))
Run Code Online (Sandbox Code Playgroud)
Cython是否提供任何合理简单有效的方法来迭代Numpy数组,就好像它们是扁平的一样? 有一些速度测试,表明缓冲的外部循环是最快的
cython将其转换为快速简单的c迭代:
for xarr in it:
x = xarr
size = x.shape[0]
for i in range(size):
x[i] = x[i]+1.0
Run Code Online (Sandbox Code Playgroud)