hsi*_*iou 5 nlp machine-learning neural-network deep-learning keras
我正在编写一个程序来将文本分类为几个类.现在,程序加载列车并测试单词索引的样本,应用嵌入层和卷积层,并将它们分类到类中.我正在尝试为实验添加手工制作的功能,如下面的代码所示.这features是两个元素的列表,其中第一个元素由训练数据的特征组成,第二个元素由测试数据的特征组成.每个训练/测试样本将具有相应的特征向量(即,特征不是单词特征).
model = Sequential()
model.add(Embedding(params.nb_words,
params.embedding_dims,
weights=[embedding_matrix],
input_length=params.maxlen,
trainable=params.trainable))
model.add(Convolution1D(nb_filter=params.nb_filter,
filter_length=params.filter_length,
border_mode='valid',
activation='relu'))
model.add(Dropout(params.dropout_rate))
model.add(GlobalMaxPooling1D())
# Adding hand-picked features
model_features = Sequential()
nb_features = len(features[0][0])
model_features.add(Dense(1,
input_shape=(nb_features,),
init='uniform',
activation='relu'))
model_final = Sequential()
model_final.add(Merge([model, model_features], mode='concat'))
model_final.add(Dense(len(citfunc.funcs), activation='softmax'))
model_final.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
print model_final.summary()
model_final.fit([x_train, features[0]], y_train,
nb_epoch=params.nb_epoch,
batch_size=params.batch_size,
class_weight=data.get_class_weights(x_train, y_train))
y_pred = model_final.predict([x_test, features[1]])
Run Code Online (Sandbox Code Playgroud)
我的问题是,这段代码是否正确?是否有任何传统方法为每个文本序列添加功能?
尝试:
input = Input(shape=(params.maxlen,))
embedding = Embedding(params.nb_words,
params.embedding_dims,
weights=[embedding_matrix],
input_length=params.maxlen,
trainable=params.trainable)(input)
conv = Convolution1D(nb_filter=params.nb_filter,
filter_length=params.filter_length,
border_mode='valid',
activation='relu')(embedding)
drop = Dropout(params.dropout_rate)(conv)
seq_features = GlobalMaxPooling1D()(drop)
# Adding hand-picked features
nb_features = len(features[0][0])
other_features = Input(shape=(nb_features,))
model_final = merge([seq_features , other_features], mode='concat'))
model_final = Dense(len(citfunc.funcs), activation='softmax'))(model_final)
model_final = Model([input, other_features], model_final)
model_final.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
Run Code Online (Sandbox Code Playgroud)
在这种情况下 - 您可以直接将序列分析中的特征与自定义特征合并 - 而无需将所有自定义特征压缩到1使用的特征Dense.