Keras - 查找嵌入

Question

Keras - 查找嵌入

我正在尝试做的事情：

我正在尝试从序列中查找每个单词的词嵌入。这是从文本生成的数字序列。

背景：

我的序列（形状（200，））看起来像这样：

50, 2092, 3974,  398,   10, 9404,    5, 1001, 3975,   15,  512... <snip>

Run Code Online (Sandbox Code Playgroud)

这些数字代表词汇表中的一个词（10000 个词）。我使用这里找到的负采样方法创建了一些嵌入权重。

提取的嵌入权重的形状为 (10000 , 106)，我可以将其加载到新的嵌入层中。

我想用加载的权重从这个新的嵌入层中查找序列中的每个数字，并让它返回 200 个与序列对应的大小为 106 的向量。

这是我到目前为止所做的：

embedding_weights = np.genfromtxt('embedding_weights.csv', delimiter=',')

    input_layer = Input(shape=(200,), name='text_input')
    embedding = Embedding(input_length=200, input_dim=vocabulary_size, output_dim=106, 
                           name='embedding_layer', trainable=False, weights=[embedding_weights])
    embedded_text = embedding(input_layer)

Run Code Online (Sandbox Code Playgroud)

这是查找嵌入的正确方法吗？

Answer 1

sdc*_*cbr 5

是的，这看起来是正确的。要实际提取嵌入，您可以将您定义的层包装在 a 中Model：

import numpy as np
from keras.layers import Input, Embedding
from keras.models import Model

# Generate some random weights
embedding_weights = np.random.rand(10000, 106)
vocabulary_size = 10000

input_layer = Input(shape=(200,), name='text_input')
embedding = Embedding(input_length=200, input_dim=vocabulary_size, output_dim=106, 
                       name='embedding_layer', trainable=False, weights=[embedding_weights])
embedded_text = embedding(input_layer)

embedding_model = Model(inputs=input_layer, outputs=embedded_text)

# Random input sequence of length 200
input_sequence = np.random.randint(0,10000,size=(1,200))
# Extract the embeddings by calling the .predict() method
sequence_embeddings = embedding_model.predict(input_sequence)

Run Code Online (Sandbox Code Playgroud)

归档时间：	7 年，5 月前
查看次数：	3054 次
最近记录：	7 年，4 月前