dis*_*ame 5 matlab machine-learning neural-network
我得到了一个最小的回声状态网络(ESN)的例子,我在试图理解回声状态网络时进行了分析.不幸的是,我有一些问题,理解为什么这真的有效 这一切都分解为问题:
这里首先是一段显示初始化重要部分的代码:
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% Generate the ESN reservoir
%
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
rand('seed', 42);
trainLen = 2000;
testLen = 2000;
initLen = 100;
data = load('MackeyGlass_t17.txt');
% Input neurons
inSize = 1;
% Output neurons
outSize = 1;
% Reservoir size
resSize = 1000;
% Leaking rate
a = 0.3;
% Input weights
Win = ( rand(resSize, (inSize+1) ) - 0.5) .* 1;
% Reservoir weights
W = rand(resSize, resSize) - 0.5;
Run Code Online (Sandbox Code Playgroud)
据我所知,输入数据集的每个数据点都从输入神经元传播到储存神经元.在大小预热后,initLen接受状态并将其存储在矩阵中X.当这样做时,每一列X代表" 储库神经元激活的载体 ".在这里,我不确定我是否做对了:
评论已经说" 收集状态 "或" 设计矩阵 " X.我是否正确,这一切都是将整个网络的状态存储在矩阵行中X?
t只是一个时间参数,那么X(:,t)代表网络的时间状态t,不是吗?在我的例子中,这意味着有1.900个时间片代表它们相应时间帧的整个网络状态(X因此是一个1002x1900矩阵).我在这里遇到的另一个问题是
1(我猜这是偏见)和u附加到此向量的输入值:X(:,t-initLen) = [1;u;x];所以:
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% Run the reservoir with the data and collect X.
%
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Allocated memory for the design (collected states) matrix
X = zeros((1+inSize) + resSize, trainLen - initLen);
% Vector of reservoir neuron activations (used for calculation)
x = zeros(resSize, 1);
% Update of the reservoir neuron activations
xUpd = zeros(resSize, 1);
for t = 1:trainLen
u = data(t);
xUpd = tanh( Win * [1;u] + W * x );
x = (1-a) * x + a * xUpd;
if ( t > initLen )
X(:,t-initLen) = [1;u;x];
end
end
Run Code Online (Sandbox Code Playgroud)
培训部分对我来说也有点神奇.我很熟悉线性回归是如何工作的,所以这不是问题所在.
我看到的是这部分只是使用孔状态矩阵X并对输入数据执行单个线性回归步骤以生成输出权重向量Wout,就是这样.
所以到目前为止所做的一切 - 如果我没有弄错的话 - 是根据状态矩阵初始化输出权重,状态矩阵X本身是使用输入数据和随机生成(输入和储存)权重生成的.
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% Train the output
%
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Set the corresponding target matrix directly
Yt = data(initLen+2:trainLen+1)';
% Regularization coefficient
reg = 1e-8;
% Get X transposed - needed twice therefore it is a little faster
X_T = X';
% Yt * pseudo_inverse(X); (linear regression task)
Wout = Yt * X_T * (X * X_T + reg * eye(1+inSize+resSize))^(-1);
Run Code Online (Sandbox Code Playgroud)
我可以用两种模式运行:生成或预测.但是,这就是我可以说的部分:" 嗯,它确实有效. "没有确切的想法为什么会这样.
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% Run the trained ESN in a generative mode. no need to initialize here,
% because x is initialized with training data and we continue from there.
%
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Y = zeros(outSize,testLen);
u = data(trainLen+1);
for t = 1:testLen
xUpd = tanh( Win*[1;u] + W*x );
x = (1-a)*x + a*xUpd;
% Generative mode:
u = Wout*[1;u;x];
% This would be a predictive mode:
%u = data(trainLen+t+1);
Y(:,t) = u;
end
Run Code Online (Sandbox Code Playgroud)
它可以很好地工作,你可以看到(生成模式):

我知道这是一个安静的巨大" 问题 ",如果这甚至可以被认为是一个.我觉得我理解单个部分,但我缺少的是这个神奇的黑盒子Echo State Network 的大图.
回声状态网络(ESN)基本上是训练回归神经网络的一种聪明方法.ESN有一个隐藏单元的"储存器",它们是耦合的.输入通过输入(加偏置)连接到储存器到隐藏连接.这些连接未经过培训.它们是随机初始化的,这是执行此初始化的代码片段(我使用的是python).
Win = (random.rand(resSize,1+inSize)-0.5) * 1
Run Code Online (Sandbox Code Playgroud)
储层中的单元是耦合的,基本上意味着存在隐藏到隐藏的连接.同样,储层中的重量不是经过训练而是初始化的.然而,储层重量的初始化是棘手的.首先对这些权重(由代码中的W表示)进行随机初始化,然后将它们乘以考虑随机矩阵的谱半径的因子.仔细初始化这些连接非常重要,因为它会影响ESN的动态(不要忘记它是一个经常性的网络).我想如果你想了解更多有关这方面的细节,你必须能够理解线性系统理论.现在,在正确初始化两个重量矩阵后,您开始向储层输入输入.对于呈现给储层的每个输入,计算激活并且这些激活是ESN的状态.请看下图.
该图显示了20个输入的200次激活的图.因此,在向ESN呈现所有输入之后,状态被收集到矩阵X中.这是在python中执行此操作的代码片段:
x = zeros((resSize,1))
for t in range(trainLen):
u = data[t]
x = (1-a)*x + a*tanh( dot( Win, vstack((1,u)) ) + dot( W, x ) )
if t >= initLen:
X[:,t-initLen] = vstack((1,u,x))[:,0]
Run Code Online (Sandbox Code Playgroud)
因此,ESN的状态是呈现给网络的输入的有限历史的函数.现在,为了预测振荡器状态的输出,唯一需要学习的是如何将输出耦合到振荡器,即隐藏到输出连接:
# train the output
reg = 1e-8 # regularization coefficient
X_T = X.T
Wout = dot( dot(Yt,X_T), linalg.inv( dot(X,X_T) + \
reg*eye(1+inSize+resSize) ) )
Run Code Online (Sandbox Code Playgroud)
然后,在训练网络之后,使用数据的测试样本测试预测能力.生成模式意味着您从时间序列的特定值开始,然后使用该值预测时间序列中的下一个值,然后使用预测值来预测下一个值,依此类推.实际上,您正在生成时间序列,因此生成模式.它允许您预测未来的多个步骤,而不是预测模式,您可以从时间序列中获取一个值并预测下一个值.
这就是为什么ESN似乎做得很好的原因.目标信号非常复杂,但在生成模式下它的表现非常好.
最后,就最小的实现而言,我猜它指的是水库的大小(1000),这显然是非常小的.