Tensorflow:具有非负约束的线性回归

Nip*_*tra 4 python constraints linear-regression tensorflow

我想实现在Tensorflow线性回归模型,有额外的限制(来自域)的Wb条款必须是非负的.

我相信有几种方法可以做到这一点.

  1. 我们可以修改成本函数来惩罚负权重[拉格朗日方法] [参见:TensorFlow - 实现权重约束的最佳方法
  2. 我们可以自己计算梯度并将它们投影到[0,无限远] [投影梯度法]

方法1:拉格朗日

当我尝试第一种方法时,我常常会以负面方式结束b.

我修改了成本函数:

cost = tf.reduce_sum(tf.pow(pred-Y, 2))/(2*n_samples)
Run Code Online (Sandbox Code Playgroud)

至:

cost = tf.reduce_sum(tf.pow(pred-Y, 2))/(2*n_samples)
nn_w = tf.reduce_sum(tf.abs(W) - W)
nn_b = tf.reduce_sum(tf.abs(b) - b)
constraint = 100.0*nn_w + 100*nn_b
cost_with_constraint = cost + constraint
Run Code Online (Sandbox Code Playgroud) 保持系数nn_bnn_w非常高会导致不稳定性和非常高的成本.

这是完整的代码.

import numpy as np
import tensorflow as tf

n_samples = 50
train_X = np.linspace(1, 50, n_samples)
train_Y = 10*train_X + 6 +40*np.random.randn(50)

X = tf.placeholder("float")
Y = tf.placeholder("float")

# Set model weights
W = tf.Variable(np.random.randn(), name="weight")
b = tf.Variable(np.random.randn(), name="bias")

# Construct a linear model
pred = tf.add(tf.multiply(X, W), b)

# Gradient descent
learning_rate=0.0001
# Initializing the variables
init = tf.global_variables_initializer()

# Mean squared error
cost = tf.reduce_sum(tf.pow(pred-Y, 2))/(2*n_samples)
nn_w = tf.reduce_sum(tf.abs(W) - W)
nn_b = tf.reduce_sum(tf.abs(b) - b)
constraint = 1.0*nn_w + 100*nn_b
cost_with_constraint = cost + constraint
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost_with_constraint)

training_epochs=200
with tf.Session() as sess:
    sess.run(init)

    # Fit all training data
    cost_array = np.zeros(training_epochs)
    W_array = np.zeros(training_epochs)
    b_array = np.zeros(training_epochs)

    for epoch in range(training_epochs):
        for (x, y) in zip(train_X, train_Y):
            sess.run(optimizer, feed_dict={X: x, Y: y})
            W_array[epoch] = sess.run(W)
            b_array[epoch] = sess.run(b)
            cost_array[epoch] = sess.run(cost, feed_dict={X: train_X, Y: train_Y})
Run Code Online (Sandbox Code Playgroud)

以下是b10次​​不同运行的平均值.

0   -1.101268
1    0.169225
2    0.158363
3    0.706270
4   -0.371205
5    0.244424
6    1.312516
7   -0.069609
8   -1.032187
9   -1.711668
Run Code Online (Sandbox Code Playgroud)

显然,第一种方法并不是最优的.此外,在选择惩罚系数方面涉及很多艺术.

方法2:投影梯度

然后我想使用第二种方法,这种方法更有效.

gr = tf.gradients(cost, [W, b])
Run Code Online (Sandbox Code Playgroud)

我们手动计算渐变并更新W和b.

 with tf.Session() as sess:
    sess.run(init)


    for epoch in range(training_epochs):
        for (x, y) in zip(train_X, train_Y):
            W_del, b_del = sess.run(gr, feed_dict={X: x, Y: y})
            W = max(0, (W - W_del)*learning_rate) #Project the gradient on [0, infinity]
            b = max(0, (b - b_del)*learning_rate) # Project the gradient on [0, infinity]
Run Code Online (Sandbox Code Playgroud)

这种方法似乎很慢.

我想知道是否有更好的方法来运行第二种方法,或者用第一种方法保证结果.我们能以某种方式允许优化器确保学习的权重是非负的吗?

编辑:如何在Autograd中执行此操作

https://github.com/HIPS/autograd/issues/207

Blu*_*Sun 6

如果将线性模型修改为:

pred = tf.add(tf.multiply(X, tf.abs(W)), tf.abs(b))
Run Code Online (Sandbox Code Playgroud)

它与仅使用正W和b值具有相同的效果.

第二种方法很慢的原因是您将张量流图中的W和b值剪切掉.(它也不会收敛,因为(W - W_del)*learning_rate必须改为W - W_del*learning_rate)

编辑:

您可以使用tensorflow图实现剪切,如下所示:

train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)

with tf.control_dependencies([train_step]):
    clip_W = W.assign(tf.maximum(0., W))
    clip_b = b.assign(tf.maximum(0., b))
    train_step_with_clip = tf.group(clip_W, clip_b)
Run Code Online (Sandbox Code Playgroud)

在这种情况下,W和b值将被剪切为0而不是小的正数.

这是一个剪切的小mnist示例:

import tensorflow as tf

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

x = tf.placeholder(tf.uint8, [None, 28, 28])
x_vec = tf.cast(tf.reshape(x, [-1, 784]), tf.float32) / 255.

W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
y = tf.matmul(x_vec, W) + b

y_target = tf.placeholder(tf.uint8, [None])
y_target_one_hot = tf.one_hot(y_target, 10)

cross_entropy = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits(labels=y_target_one_hot, logits=y))

train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

with tf.control_dependencies([train_step]):
    clip_W = W.assign(tf.maximum(0., W))
    clip_b = b.assign(tf.maximum(0., b))
    train_step_with_clip = tf.group(clip_W, clip_b)

correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_target_one_hot, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

with tf.Session() as sess:
  tf.global_variables_initializer().run()

  for i in range(1000):
    sess.run(train_step_with_clip, feed_dict={
        x: x_train[(i*100)%len(x_train):((i+1)*100)%len(x_train)],
        y_target: y_train[(i*100)%len(x_train):((i+1)*100)%len(x_train)]})

    if not i%100:
      print("Min_W:", sess.run(tf.reduce_min(W)))
      print("Min_b:", sess.run(tf.reduce_min(b)))

  print("Accuracy:", sess.run(accuracy, feed_dict={
      x: x_test,
      y_target: y_test}))
Run Code Online (Sandbox Code Playgroud)