Aad*_*Ura 8 machine-learning python-3.x deep-learning keras tensorflow
我正在研究一个深度学习模型,我试图将两个不同模型的输出结合起来:
整体结构是这样的:
所以第一个模型采用一个矩阵,例如 [ 10 x 30 ]
#input 1
input_text = layers.Input(shape=(1,), dtype="string")
embedding = ElmoEmbeddingLayer()(input_text)
model_a = Model(inputs = [input_text] , outputs=embedding)
# shape : [10,50]
Run Code Online (Sandbox Code Playgroud)
现在第二个模型需要两个输入矩阵:
X_in = layers.Input(tensor=K.variable(np.random.uniform(0,9,[10,32])))
M_in = layers.Input(tensor=K.variable(np.random.uniform(1,-1,[10,10]))
md_1 = New_model()([X_in, M_in]) #new_model defined somewhere
model_s = Model(inputs = [X_in, A_in], outputs = md_1)
# shape : [10,50]
Run Code Online (Sandbox Code Playgroud)
我想让这两个矩阵像在 TensorFlow 中一样可训练,我能够通过以下方式做到这一点:
matrix_a = tf.get_variable(name='matrix_a',
shape=[10,10],
dtype=tf.float32,
initializer=tf.constant_initializer(np.array(matrix_a)),trainable=True)
Run Code Online (Sandbox Code Playgroud)
我不知道如何使这些 matrix_a 和 matrix_b 可训练,以及如何合并两个网络的输出然后给出输入。
我经历了这个 问题但找不到答案,因为他们的问题陈述与我的不同。
到目前为止我尝试过的是:
#input 1
input_text = layers.Input(shape=(1,), dtype="string")
embedding = ElmoEmbeddingLayer()(input_text)
model_a = Model(inputs = [input_text] , outputs=embedding)
# shape : [10,50]
X_in = layers.Input(tensor=K.variable(np.random.uniform(0,9,[10,10])))
M_in = layers.Input(tensor=K.variable(np.random.uniform(1,-1,[10,100]))
md_1 = New_model()([X_in, M_in]) #new_model defined somewhere
model_s = Model(inputs = [X_in, A_in], outputs = md_1)
# [10,50]
#tranpose second model output
tranpose = Lambda(lambda x: K.transpose(x))
agglayer = tranpose(md_1)
# concat first and second model output
dott = Lambda(lambda x: K.dot(x[0],x[1]))
kmean_layer = dotter([embedding,agglayer])
# input
final_model = Model(inputs=[input_text, X_in, M_in], outputs=kmean_layer,name='Final_output')
final_model.compile(loss = 'categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
final_model.summary()
Run Code Online (Sandbox Code Playgroud)
模型概述:
更新:
型号 b
X = np.random.uniform(0,9,[10,32])
M = np.random.uniform(1,-1,[10,10])
X_in = layers.Input(tensor=K.variable(X))
M_in = layers.Input(tensor=K.variable(M))
layer_one = Model_b()([M_in, X_in])
dropout2 = Dropout(dropout_rate)(layer_one)
layer_two = Model_b()([layer_one, X_in])
model_b_ = Model([X_in, M_in], layer_two, name='model_b')
Run Code Online (Sandbox Code Playgroud)
模型a
length = 150
dic_size = 100
embed_size = 12
input_text = Input(shape=(length,))
embedding = Embedding(dic_size, embed_size)(input_text)
embedding = LSTM(5)(embedding)
embedding = Dense(10)(embedding)
model_a = Model(input_text, embedding, name = 'model_a')
Run Code Online (Sandbox Code Playgroud)
我是这样合并的:
mult = Lambda(lambda x: tf.matmul(x[0], x[1], transpose_b=True))([embedding, model_b_.output])
final_model = Model(inputs=[model_b_.input[0],model_b_.input[1],model_a.input], outputs=mult)
Run Code Online (Sandbox Code Playgroud)
matmul 两个 keras 模型是正确的方法吗?
我不知道我是否正确合并了输出以及模型是否正确。
如果有人就如何使该矩阵可训练以及如何正确合并模型的输出然后提供输入给我一些建议,我将不胜感激。
提前致谢!
好的。由于您将拥有自定义可训练权重,因此在 Keras 中执行此操作的方法是创建自定义层。
现在,由于您的自定义层没有输入,因此我们需要一个技巧,稍后将对此进行解释。
因此,这是自定义权重的层定义:
from keras.layers import *
from keras.models import Model
from keras.initializers import get as get_init, serialize as serial_init
import keras.backend as K
import tensorflow as tf
class TrainableWeights(Layer):
#you can pass keras initializers when creating this layer
#kwargs will take base layer arguments, such as name and others if you want
def __init__(self, shape, initializer='uniform', **kwargs):
super(TrainableWeights, self).__init__(**kwargs)
self.shape = shape
self.initializer = get_init(initializer)
#build is where you define the weights of the layer
def build(self, input_shape):
self.kernel = self.add_weight(name='kernel',
shape=self.shape,
initializer=self.initializer,
trainable=True)
self.built = True
#call is the layer operation - due to keras limitation, we need an input
#warning, I'm supposing the input is a tensor with value 1 and no shape or shape (1,)
def call(self, x):
return x * self.kernel
#for keras to build the summary properly
def compute_output_shape(self, input_shape):
return self.shape
#only needed for saving/loading this layer in model.save()
def get_config(self):
config = {'shape': self.shape, 'initializer': serial_init(self.initializer)}
base_config = super(TrainableWeights, self).get_config()
return dict(list(base_config.items()) + list(config.items()))
Run Code Online (Sandbox Code Playgroud)
现在,该层应该像这样使用:
dummyInputs = Input(tensor=K.constant([1]))
trainableWeights = TrainableWeights(shape)(dummyInputs)
Run Code Online (Sandbox Code Playgroud)
定义好图层后,我们就可以开始建模了。
首先我们看一下侧面model_a:
#general vars
length = 150
dic_size = 100
embed_size = 12
#for the model_a segment
input_text = Input(shape=(length,))
embedding = Embedding(dic_size, embed_size)(input_text)
#the following two lines are just a resource to reach the desired shape
embedding = LSTM(5)(embedding)
embedding = Dense(50)(embedding)
#creating model_a here is optional, only if you want to use model_a independently later
model_a = Model(input_text, embedding, name = 'model_a')
Run Code Online (Sandbox Code Playgroud)
为此,我们将使用我们的TrainableWeights图层。
但首先,让我们模拟一下New_model()前面提到的。
#simulates New_model() #notice the explicit batch_shape for the matrices
newIn1 = Input(batch_shape = (10,10))
newIn2 = Input(batch_shape = (10,30))
newOut1 = Dense(50)(newIn1)
newOut2 = Dense(50)(newIn2)
newOut = Add()([newOut1, newOut2])
new_model = Model([newIn1, newIn2], newOut, name='new_model')
Run Code Online (Sandbox Code Playgroud)
现在整个分支:
#the matrices
dummyInput = Input(tensor = K.constant([1]))
X_in = TrainableWeights((10,10), initializer='uniform')(dummyInput)
M_in = TrainableWeights((10,30), initializer='uniform')(dummyInput)
#the output of the branch
md_1 = new_model([X_in, M_in])
#optional, only if you want to use model_s independently later
model_s = Model(dummyInput, md_1, name='model_s')
Run Code Online (Sandbox Code Playgroud)
最后,我们可以将分支连接到一个整体模型中。
请注意,我不必在这里使用model_aor model_s。如果您愿意,您可以这样做,但不需要这些子模型,除非您以后想单独获取它们以用于其他用途。(即使您创建了它们,也无需更改下面的代码即可使用它们,它们已经是同一图表的一部分)
#I prefer tf.matmul because it's clear and understandable while K.dot has weird behaviors
mult = Lambda(lambda x: tf.matmul(x[0], x[1], transpose_b=True))([embedding, md_1])
#final model
model = Model([input_text, dummyInput], mult, name='full_model')
Run Code Online (Sandbox Code Playgroud)
现在训练它:
model.compile('adam', 'binary_crossentropy', metrics=['accuracy'])
model.fit(np.random.randint(0,dic_size, size=(128,length)),
np.ones((128, 10)))
Run Code Online (Sandbox Code Playgroud)
由于现在输出是二维的,所以没有问题'categorical_crossentropy',我的评论是因为对输出形状的怀疑。