Tai*_*iko 3 python machine-learning neural-network tensorflow
2 层 MLP (Relu) + Softmax
在 20 次迭代之后,Tensor Flow 只是放弃并停止更新任何权重或偏差。
我最初认为我的 ReLu 在哪里死亡,所以我显示了直方图以确保它们都不是 0。而且它们都不是 !
他们只是在几次迭代后停止变化,交叉熵仍然很高。ReLu、Sigmoid 和 tanh 给出了相同的结果。将 GradientDescentOptimizer 从 0.01 调整到 0.5 也没有太大变化。
某处一定有错误。就像我代码中的一个实际错误。我什至不能过度拟合一个小样本集!
这是我的直方图,这是我的代码,如果有人可以检查出来,那将是一个很大的帮助。
我们有 3000 个标量,其中 6 个值介于 0 和 255 之间,可分为两类:[1,0] 或 [0,1](我确保将顺序随机化)
def nn_layer(input_tensor, input_dim, output_dim, layer_name, act=tf.nn.relu):
with tf.name_scope(layer_name):
weights = tf.Variable(tf.truncated_normal([input_dim, output_dim], stddev=1.0 / math.sqrt(float(6))))
tf.summary.histogram('weights', weights)
biases = tf.Variable(tf.constant(0.4, shape=[output_dim]))
tf.summary.histogram('biases', biases)
preactivate = tf.matmul(input_tensor, weights) + biases
tf.summary.histogram('pre_activations', preactivate)
#act=tf.nn.relu
activations = act(preactivate, name='activation')
tf.summary.histogram('activations', activations)
return activations
#We have 3000 scalars with 6 values between 0 and 255 to classify in two classes
x = tf.placeholder(tf.float32, [None, 6])
y = tf.placeholder(tf.float32, [None, 2])
#After normalisation, input is between 0 and 1
normalised = tf.scalar_mul(1/255,x)
#Two layers
hidden1 = nn_layer(normalised, 6, 4, "hidden1")
hidden2 = nn_layer(hidden1, 4, 2, "hidden2")
#Finish by a softmax
softmax = tf.nn.softmax(hidden2)
#Defining loss, accuracy etc..
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=softmax))
tf.summary.scalar('cross_entropy', cross_entropy)
correct_prediction = tf.equal(tf.argmax(softmax, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
tf.summary.scalar('accuracy', accuracy)
#Init session and writers and misc
session = tf.Session()
train_writer = tf.summary.FileWriter('log', session.graph)
train_writer.add_graph(session.graph)
init= tf.global_variables_initializer()
session.run(init)
merged = tf.summary.merge_all()
#Train
train_step = tf.train.GradientDescentOptimizer(0.05).minimize(cross_entropy)
batch_x, batch_y = self.trainData
for _ in range(1000):
session.run(train_step, {x: batch_x, y: batch_y})
#Every 10 steps, add to the summary
if _ % 10 == 0:
s = session.run(merged, {x: batch_x, y: batch_y})
train_writer.add_summary(s, _)
#Evaluate
evaluate_x, evaluate_y = self.evaluateData
print(session.run(accuracy, {x: batch_x, y: batch_y}))
print(session.run(accuracy, {x: evaluate_x, y: evaluate_y}))
Run Code Online (Sandbox Code Playgroud)
隐藏层 1。输出不是零,所以这不是一个垂死的 ReLu 问题。但仍然,权重是恒定的!TF 甚至没有尝试修改它们
隐藏层 2 也是如此。TF 尝试对它们进行一些调整,但很快就放弃了。
交叉熵确实减少了,但仍然高得惊人。
编辑: 我的代码中有很多错误。第一个是 python 中的 1/255 = 0 ... 将其更改为 1.0/255.0 并且我的代码开始生效。
所以基本上,我的输入乘以 0 并且神经网络纯粹是盲目的。所以他试图在失明的情况下获得最好的结果,然后放弃了。这完全解释了它的反应。
现在我应用了两次 softmax……修改它也有帮助。通过尝试不同的学习率和不同的时期数,我终于找到了一些好东西。
这是最终的工作代码:
def runModel(self):
def nn_layer(input_tensor, input_dim, output_dim, layer_name, act=tf.nn.relu):
with tf.name_scope(layer_name):
#This is standard weight for neural networks with ReLu.
#I divide by math.sqrt(float(6)) because my input has 6 values
weights = tf.Variable(tf.truncated_normal([input_dim, output_dim], stddev=1.0 / math.sqrt(float(6))))
tf.summary.histogram('weights', weights)
#I chose this bias myself. It work. Not sure why.
biases = tf.Variable(tf.constant(0.4, shape=[output_dim]))
tf.summary.histogram('biases', biases)
preactivate = tf.matmul(input_tensor, weights) + biases
tf.summary.histogram('pre_activations', preactivate)
#Some neurons will have ReLu as activation function
#Some won't have any activation functions
if act == "None":
activations = preactivate
else :
activations = act(preactivate, name='activation')
tf.summary.histogram('activations', activations)
return activations
#We have 3000 scalars with 6 values between 0 and 255 to classify in two classes
x = tf.placeholder(tf.float32, [None, 6])
y = tf.placeholder(tf.float32, [None, 2])
#After normalisation, input is between 0 and 1
#Normalising input really helps. Nothing is doable without it
#But my ERROR was to write 1/255. Becase in python
#1/255 = 0 .... (integer division)
#But 1.0/255.0 = 0,003921568 (float division)
normalised = tf.scalar_mul(1.0/255.0,x)
#Three layers total. The first one is just a matrix multiplication
input = nn_layer(normalised, 6, 4, "input", act="None")
#The second one has a ReLu after a matrix multiplication
hidden1 = nn_layer(input, 4, 4, "hidden", act=tf.nn.relu)
#The last one is also jsut a matrix multiplcation
#WARNING ! No softmax here ! Because later we call a function
#That implicitly does a softmax
#And it's bad practice to do two softmax one after the other
output = nn_layer(hidden1, 4, 2, "output", act="None")
#Tried different learning rates
#Higher learning rate means find a result faster
#But could be a local minimum
#Lower learning rate means we need much more epochs
learning_rate = 0.03
with tf.name_scope('learning_rate_'+str(learning_rate)):
#Defining loss, accuracy etc..
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=output))
tf.summary.scalar('cross_entropy', cross_entropy)
correct_prediction = tf.equal(tf.argmax(output, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
tf.summary.scalar('accuracy', accuracy)
#Init session and writers and misc
session = tf.Session()
train_writer = tf.summary.FileWriter('log', session.graph)
train_writer.add_graph(session.graph)
init= tf.global_variables_initializer()
session.run(init)
merged = tf.summary.merge_all()
#Train
train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(cross_entropy)
batch_x, batch_y = self.trainData
for _ in range(1000):
session.run(train_step, {x: batch_x, y: batch_y})
#Every 10 steps, add to the summary
if _ % 10 == 0:
s = session.run(merged, {x: batch_x, y: batch_y})
train_writer.add_summary(s, _)
#Evaluate
evaluate_x, evaluate_y = self.evaluateData
print(session.run(accuracy, {x: batch_x, y: batch_y}))
print(session.run(accuracy, {x: evaluate_x, y: evaluate_y}))
Run Code Online (Sandbox Code Playgroud)
恐怕您必须降低学习率。太高了 高学习率通常会让你达到局部最小值而不是全局最小值。
尝试 0.001、0.0001 甚至 0.00001。或者让你的学习率灵活。
我没有检查代码,所以首先尝试调整LR。
| 归档时间: |
|
| 查看次数: |
454 次 |
| 最近记录: |