如何将预测序列转换回keras中的文本?

Eka*_*Eka 14 python keras keras-layer sequence-to-sequence

我有一个顺序学习模型的序列,它工作正常,能够预测一些输出.问题是我不知道如何将输出转换回文本序列.

这是我的代码.

from keras.preprocessing.text import Tokenizer,base_filter
from keras.preprocessing.sequence import pad_sequences
from keras.models import Sequential
from keras.layers import Dense

txt1="""What makes this problem difficult is that the sequences can vary in length,
be comprised of a very large vocabulary of input symbols and may require the model 
to learn the long term context or dependencies between symbols in the input sequence."""

#txt1 is used for fitting 
tk = Tokenizer(nb_words=2000, filters=base_filter(), lower=True, split=" ")
tk.fit_on_texts(txt1)

#convert text to sequence
t= tk.texts_to_sequences(txt1)

#padding to feed the sequence to keras model
t=pad_sequences(t, maxlen=10)

model = Sequential()
model.add(Dense(10,input_dim=10))
model.add(Dense(10,activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam',metrics=['accuracy'])

#predicting new sequcenc
pred=model.predict(t)

#Convert predicted sequence to text
pred=??
Run Code Online (Sandbox Code Playgroud)

Ben*_*man 12

这是我找到的解决方案:

reverse_word_map = dict(map(reversed, tokenizer.word_index.items()))
Run Code Online (Sandbox Code Playgroud)


Esb*_*rdt 11

我不得不解决同样的问题,所以这就是我最终如何做到这一点(灵感来自@Ben Usemans逆向字典).

# Importing library
from keras.preprocessing.text import Tokenizer

# My texts
texts = ['These are two crazy sentences', 'that I want to convert back and forth']

# Creating a tokenizer
tokenizer = Tokenizer(lower=True)

# Building word indices
tokenizer.fit_on_texts(texts)

# Tokenizing sentences
sentences = tokenizer.texts_to_sequences(texts)

>sentences
>[[1, 2, 3, 4, 5], [6, 7, 8, 9, 10, 11, 12, 13]]

# Creating a reverse dictionary
reverse_word_map = dict(map(reversed, tokenizer.word_index.items()))

# Function takes a tokenized sentence and returns the words
def sequence_to_text(list_of_indices):
    # Looking up words in dictionary
    words = [reverse_word_map.get(letter) for letter in list_of_indices]
    return(words)

# Creating texts 
my_texts = list(map(sequence_to_text, sentences))

>my_texts
>[['these', 'are', 'two', 'crazy', 'sentences'], ['that', 'i', 'want', 'to', 'convert', 'back', 'and', 'forth']]
Run Code Online (Sandbox Code Playgroud)

  • 只是一个用于反转 word_index 顺序的替代代码`reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])` (2认同)

Jai*_*ves 6

您可以直接使用反tokenizer.sequences_to_texts函数。

text = tokenizer.sequences_to_texts(<list of the integer equivalent encodings>)

我已经测试了上述内容,并且可以正常工作。

PS .:请格外小心,以使参数成为整数编码的列表,而不是One Hot编码。

  • 似乎是最直接的答案,如果您需要查看它的作用,请尝试以下行:`print(tokenizer.sequences_to_texts([[1]]))` (3认同)