在keras完全梯度下降

Question

在keras完全梯度下降

use*_*231 8 python machine-learning gradient-descent deep-learning keras

我正在尝试在keras中实现完全梯度下降.这意味着对于每个纪元,我都在训练整个数据集.这就是批量大小被定义为训练集的长度大小的原因.

from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import SGD,Adam
from keras import regularizers
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline  
import random
from numpy.random import seed
import random

def xrange(start_point,end_point,N,base):
    temp = np.logspace(0.1, 1, N,base=base,endpoint=False)
    temp=temp-temp.min()
    temp=(0.0+temp)/(0.0+temp.max()) #this is between 0 and 1
    return (end_point-start_point)*temp +start_point #this is the range

def train_model(x_train,y_train,x_test):
    #seed(1)
    model=Sequential()
    num_units=100
    act='relu'
    model.add(Dense(num_units,input_shape=(1,),activation=act)) 
    model.add(Dense(num_units,activation=act))
    model.add(Dense(num_units,activation=act))
    model.add(Dense(num_units,activation=act))
    model.add(Dense(1,activation='tanh')) #output layer 1 unit ; activation='tanh'
    model.compile(Adam(),'mean_squared_error',metrics=['mse'])
    history=model.fit(x_train,y_train,batch_size=len(x_train),epochs=500,verbose=0,validation_split = 0.2 ) #train on the noise (not moshe)
    fit=model.predict(x_test)
    loss = history.history['loss']
    val_loss = history.history['val_loss']
    return fit

N = 1024
start_point=-5.25
end_point=5.25
base=500# the base of the log of the trainning
train_step=0.0007
x_test=np.arange(start_point,end_point,train_step+0.05)

x_train=xrange(start_point,end_point,N,base)
#random.shuffle(x_train)

function_y=np.sin(3*x_train)/2
noise=np.random.uniform(-0.2,0.2,len(function_y))
y_train=function_y+noise
fit=train_model(x_train,y_train,x_test)

plt.scatter(x_train,y_train, facecolors='none', edgecolors='g') #plt.plot(x_value,sample,'bo')
plt.scatter(x_test, fit, facecolors='none', edgecolors='b') #plt.plot(x_value,sample,'bo')

Run Code Online (Sandbox Code Playgroud)

但是,当我取消注释#random.Sshuffle(x_train)时 - 为了改变训练. :

我不明白为什么我得到不同的情节(绿色圆圈是训练,蓝色是现代学习的东西).在两种情况下,批次都是所有数据集.所以洗牌不应该改变任何东西.
谢谢 .

阿里尔

Answer 1

rvi*_*nas 6

This happens for two reasons:

First, when the data is not shuffled, the train/validation split is inappropriate.
Second, full gradient descent performs a single update per epoch, so more training epochs might be required to converge.

Why doesn't your model match the wave?

From model.fit:

validation_split: Float between 0 and 1. Fraction of the training data to be used as validation data. The model will set apart this fraction of the training data, will not train on it, and will evaluate the loss and any model metrics on this data at the end of each epoch. The validation data is selected from the last samples in the x and y data provided, before shuffling.

Which means that your validation set consists of the last 20% training samples. Because you are using a log scale for your independent variable (x_train), it turns out that your train/validation split is:

split_point = int(0.2*N)
x_val = x_train[-split_point:]
y_val = y_train[-split_point:]
x_train_ = x_train[:-split_point]
y_train_ = y_train[:-split_point]
plt.scatter(x_train_, y_train_, c='g')
plt.scatter(x_val, y_val, c='r')
plt.show()

Run Code Online (Sandbox Code Playgroud)

在上一个图中，训练和验证数据分别由绿色和红色点表示。请注意，您的训练数据集不能代表整个人群。

为什么仍不匹配训练数据集？

除了不适当的训练/测试拆分之外，全梯度下降可能需要更多的训练纪元才能收敛（梯度噪声较小，但每个纪元仅执行单个梯度更新）。相反，如果您以约1500个时间段训练模型（或使用批量大小为例如32的小批量梯度下降），则最终会得到：

@VISQL 优化算法与批量大小无关（[fit](https://keras.io/models/sequential/#fit) 中默认为 32）。设置“batch_size=1”会产生 SGD，而“batch_size=nb_total_samples”对应于完全梯度下降。您可以通过设置介于两者之间的值来执行小批量梯度下降（这是最典型的选择）。 (2认同)

归档时间：	7 年，2 月前
查看次数：	1442 次
最近记录：	7 年，2 月前