标签: ctc

使用Tensorflow的连接主义时间分类(CTC)实现

我试图在contrib包(tf.contrib.ctc.ctc_loss)下使用Tensorflow的CTC实现,但没有成功.

首先,任何人都知道我在哪里可以阅读一个好的分步教程？Tensorflow的文档在这个主题上非常糟糕.
我是否必须向ctc_loss提供交错的空白标签？
即使使用长度超过200个时期的火车数据集,我也无法过度使用我的网络.:(
如何使用tf.edit_distance计算标签错误率？

这是我的代码:

with graph.as_default():

  max_length = X_train.shape[1]
  frame_size = X_train.shape[2]
  max_target_length = y_train.shape[1]

  # Batch size x time steps x data width
  data = tf.placeholder(tf.float32, [None, max_length, frame_size])
  data_length = tf.placeholder(tf.int32, [None])

  #  Batch size x max_target_length
  target_dense = tf.placeholder(tf.int32, [None, max_target_length])
  target_length = tf.placeholder(tf.int32, [None])

  #  Generating sparse tensor representation of target
  target = ctc_label_dense_to_sparse(target_dense, target_length)

  # Applying LSTM, returning output for each timestep (y_rnn1, 
  # [batch_size, max_time, cell.output_size]) and the final state of shape …

Run Code Online (Sandbox Code Playgroud)

speech-recognition end-to-end tensorflow ctc

Igo*_*lha

2018 12-09

12
推荐指数

2
解决办法

1万
查看次数

了解 Keras 中语音识别的 CTC 损失

我试图了解 CTC 损失如何用于语音识别以及它如何在 Keras 中实现。

我想我明白了什么（如果我错了，请纠正我！）

大体上，CTC 损失被添加到经典网络之上，以便逐个元素（文本或语音的逐个字母）解码顺序信息元素，而不是直接直接解码元素块（例如单词）。

假设我们正在将某些句子的话语作为 MFCC 来提供。

使用 CTC-loss 的目标是学习如何使每个字母在每个时间步与 MFCC 匹配。因此，Dense+softmax 输出层由与组成句子所需的元素数量一样多的神经元组成：

字母 (a, b, ..., z)
空白标记 (-)
一个空格 (_) 和一个结束字符 (>)

然后，softmax 层有 29 个神经元（26 个用于字母表 + 一些特殊字符）。

为了实现它，我发现我可以做这样的事情：

# CTC implementation from Keras example found at https://github.com/keras- 
# team/keras/blob/master/examples/image_ocr.py

def ctc_lambda_func(args):
    y_pred, labels, input_length, label_length = args
    # the 2 is critical here since the first couple outputs of the RNN
    # tend to be garbage:
    # print "y_pred_shape: ", y_pred.shape
    y_pred …

Run Code Online (Sandbox Code Playgroud)

python deep-learning keras tensorflow ctc

Bap*_*ier

2019 08-08

6
推荐指数

1
解决办法

5667
查看次数

联结主义时间分类 (CTC) 空白标签

我正在尝试在我的网络中使用 CTC 损失函数，但不太明白何时将“空白”标签作为标签提供。

我在Molchanov所描述的手势识别中使用它，但让我感到困惑的是还有一个“无手势”。

在 tensorflow 文档中，描述了

输入 Tensor 的最里面的维度大小 num_classes 表示 num_labels + 1 个类，其中 num_labels 是真实标签的数量，最大值（num_classes - 1）为空白标签保留。

如果我现在使用“空白”标签来表示没有手势，则由于错误，我的训练受到限制

在空标签后面看到一个非空标签（索引 >= num_classes - 1）

我假设空标签与空白标签相同。

问题是，当我想提供从没有手势（映射到空标签）然后有手势的数据时，我得到了这个错误。我可以通过在现有标签旁边添加另外两个标签来避免它，一个用于“无手势”，另一个用于“空白标签/空标签”。然后我只提供“无手势”标签，但从不提供“空白”标签，但这似乎不太正确。

所以我的问题是，我应该将“空白/空”标签用于什么？

我可以想象在语言处理中，您通常会使用句子结尾点作为“空”标签吗？但是没有结束手势，因为它是一个连续的流。

谢谢

machine-learning tensorflow ctc

Kil*_*sen

2018 12-09

5
推荐指数

1
解决办法

3377
查看次数

CTC：空格和空白有什么区别？

在 2006 年关于联结主义时间分类的文章中，Alex Graves 等人。引入了一种具有27 个标签的语音解码模型：26 个用于字母表字母，一个用于空白，意思是没有标签（我理解为沉默）。

然而，我看到很多 CTC 的实现都使用28 个标签，一个是空白，另一个是空格。到目前为止，我还无法找到需要使用这两个标签的解释，对我来说，它们代表同一件事。

您能否解释一下 CTC 背景下空白和空格之间的区别以及为什么需要这两个标签？

speech-recognition speech speech-to-text labeling ctc

Nic*_* D.

2019 03-22

5
推荐指数

1
解决办法

2203
查看次数

如何将decode_batch_predictions()方法添加到Keras Captcha OCR模型中？

当前的Keras Captcha OCR 模型返回 CTC 编码输出，需要推理后解码。

要对其进行解码，需要在推理之后作为单独的步骤运行解码实用函数。

preds = prediction_model.predict(batch_images)
pred_texts = decode_batch_predictions(preds)

Run Code Online (Sandbox Code Playgroud)

解码的效用函数使用keras.backend.ctc_decode，而又使用贪婪解码器或波束搜索解码器。

# A utility function to decode the output of the network
def decode_batch_predictions(pred):
    input_len = np.ones(pred.shape[0]) * pred.shape[1]
    # Use greedy search. For complex tasks, you can use beam search
    results = keras.backend.ctc_decode(pred, input_length=input_len, greedy=True)[0][0][
        :, :max_length
    ]
    # Iterate over the results and get back the text
    output_text = []
    for res in results:
        res = tf.strings.reduce_join(num_to_char(res)).numpy().decode("utf-8")
        output_text.append(res)
    return output_text …

Run Code Online (Sandbox Code Playgroud)

ocr decoding keras ctc

lee*_*emm

lucky-day

5
推荐指数

1
解决办法

546
查看次数