Rob*_*rto 1 python neural-network keras tensorflow scipy-optimize
在引入 Tensorflow 2.0 之后,scipy 接口 (tf.contrib.opt.ScipyOptimizerInterface) 已被删除。但是,我仍然想使用 scipy 优化器scipy.optimize.minimize(method='L-BFGS-B')来训练神经网络(keras 模型序列)。为了使优化器工作,它需要一个函数fun(x0)作为输入,其中x0是一个形状为 (n,) 的数组。因此,第一步是“展平”权重矩阵以获得具有所需形状的向量。为此,我修改了https://pychao.com/2019/11/02/optimize-tensorflow-keras-models-with-l-bfgs-from-tensorflow-probability/提供的代码。这提供了一个函数工厂来创建这样一个函数乐趣(x0)。然而,代码似乎不起作用,损失函数并没有减少。如果有人能帮我解决这个问题,我将不胜感激。
这是我正在使用的一段代码:
func = function_factory(model, loss_function, x_u_train, u_train)
# convert initial model parameters to a 1D tf.Tensor
init_params = tf.dynamic_stitch(func.idx, model.trainable_variables)
init_params = tf.cast(init_params, dtype=tf.float32)
# train the model with L-BFGS solver
results = scipy.optimize.minimize(fun=func, x0=init_params, method='L-BFGS-B')
def loss_function(x_u_train, u_train, network):
u_pred = tf.cast(network(x_u_train), dtype=tf.float32)
loss_value = tf.reduce_mean(tf.square(u_train - u_pred))
return tf.cast(loss_value, dtype=tf.float32)
def function_factory(model, loss_f, x_u_train, u_train):
"""A factory to create a function required by tfp.optimizer.lbfgs_minimize.
Args:
model [in]: an instance of `tf.keras.Model` or its subclasses.
loss [in]: a function with signature loss_value = loss(pred_y, true_y).
train_x [in]: the input part of training data.
train_y [in]: the output part of training data.
Returns:
A function that has a signature of:
loss_value, gradients = f(model_parameters).
"""
# obtain the shapes of all trainable parameters in the model
shapes = tf.shape_n(model.trainable_variables)
n_tensors = len(shapes)
# we'll use tf.dynamic_stitch and tf.dynamic_partition later, so we need to
# prepare required information first
count = 0
idx = [] # stitch indices
part = [] # partition indices
for i, shape in enumerate(shapes):
n = np.product(shape)
idx.append(tf.reshape(tf.range(count, count+n, dtype=tf.int32), shape))
part.extend([i]*n)
count += n
part = tf.constant(part)
def assign_new_model_parameters(params_1d):
"""A function updating the model's parameters with a 1D tf.Tensor.
Args:
params_1d [in]: a 1D tf.Tensor representing the model's trainable parameters.
"""
params = tf.dynamic_partition(params_1d, part, n_tensors)
for i, (shape, param) in enumerate(zip(shapes, params)):
model.trainable_variables[i].assign(tf.cast(tf.reshape(param, shape), dtype=tf.float32))
# now create a function that will be returned by this factory
def f(params_1d):
"""
This function is created by function_factory.
Args:
params_1d [in]: a 1D tf.Tensor.
Returns:
A scalar loss.
"""
# update the parameters in the model
assign_new_model_parameters(params_1d)
# calculate the loss
loss_value = loss_f(x_u_train, u_train, model)
# print out iteration & loss
f.iter.assign_add(1)
tf.print("Iter:", f.iter, "loss:", loss_value)
return loss_value
# store these information as members so we can use them outside the scope
f.iter = tf.Variable(0)
f.idx = idx
f.part = part
f.shapes = shapes
f.assign_new_model_parameters = assign_new_model_parameters
return f
Run Code Online (Sandbox Code Playgroud)
这里模型是一个对象 tf.keras.Sequential。
预先感谢您的任何帮助!
从 tf1 更改为 tf2 我遇到了同样的问题,经过一些实验后,我找到了下面的解决方案,该解决方案显示了如何在用 tf.function 修饰的函数和 scipy 优化器之间建立接口。与问题相比,重要的变化是:
jac=True我在下面提供了一个如何解决玩具问题的示例。
import tensorflow as tf
import numpy as np
import scipy.optimize as sopt
def model(x):
return tf.reduce_sum(tf.square(x-tf.constant(2, dtype=tf.float32)))
@tf.function
def val_and_grad(x):
with tf.GradientTape() as tape:
tape.watch(x)
loss = model(x)
grad = tape.gradient(loss, x)
return loss, grad
def func(x):
return [vv.numpy().astype(np.float64) for vv in val_and_grad(tf.constant(x, dtype=tf.float32))]
resdd= sopt.minimize(fun=func, x0=np.ones(5),
jac=True, method='L-BFGS-B')
print("info:\n",resdd)
Run Code Online (Sandbox Code Playgroud)
显示
info:
fun: 7.105427357601002e-14
hess_inv: <5x5 LbfgsInvHessProduct with dtype=float64>
jac: array([-2.38418579e-07, -2.38418579e-07, -2.38418579e-07, -2.38418579e-07,
-2.38418579e-07])
message: b'CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL'
nfev: 3
nit: 2
status: 0
success: True
x: array([1.99999988, 1.99999988, 1.99999988, 1.99999988, 1.99999988])
Run Code Online (Sandbox Code Playgroud)
为了比较速度,我使用 lbfgs 优化器来解决样式转换问题(有关网络,请参见此处)。请注意,对于这个问题,网络参数是固定的,输入信号是适应的。由于优化的参数(输入信号)是一维的,因此不需要函数工厂。
我比较了四种实现
对于这个比较,优化在 300 次迭代后停止(通常为了收敛问题需要 3000 次迭代)
Method runtime(300it) final loss
TF1.12 240s 0.045 (baseline)
TF2.0 (E) 299s 0.045
TF2.0 (G) 233s 0.045
TF2.0/TFP 226s 0.053
Run Code Online (Sandbox Code Playgroud)
TF2.0 急切模式 (TF2.0(E)) 工作正常,但比 TF1.12 基线版本慢约 20%。带有 tf.function 的 TF2.0(G) 工作正常,并且比 TF1.12 略快,这是一件好事。
来自 tensorflow_probability (TF2.0/TFP) 的优化器比使用 scipy 的 lbfgs 的 TF2.0(G) 略快,但没有实现相同的错误减少。事实上,随着时间的推移损失的减少并不是单调的,这似乎是一个坏兆头。比较 lbfgs 的两种实现(scipy 和 tensorflow_probability=TFP),很明显 scipy 中的 Fortran 代码要复杂得多。因此,TFP 中算法的简化在这里是有害的,甚至 TFP 在 float32 中执行所有计算的事实也可能是一个问题。
| 归档时间: |
|
| 查看次数: |
4092 次 |
| 最近记录: |