deeplearning4j - 使用RNN/LSTM进行音频信号处理

Question

deeplearning4j - 使用RNN/LSTM进行音频信号处理

eri*_*d71 13 java machine-learning audio-processing deeplearning4j

我正在尝试使用deeplearning4j训练RNN进行数字(音频)信号处理.我们的想法是拥有2个.wav文件:一个是录音,第二个是相同的录音但是经过处理(例如使用低通滤波器).RNN的输入是第一个(未处理的)音频记录,输出是第二个(已处理的)音频记录.

我已经使用了dl4j示例中的GravesLSTMCharModellingExample,并且大部分都使用了CharacterIterator类来接受音频数据而不是文本.

我使用dl4j处理音频的第一个项目基本上与GravesLSTMCharModellingExample做同样的事情,但生成音频而不是文本,使用11025Hz 8位单声道音频,这是有效的(对于一些非常有趣的结果).因此,在这种情况下使用音频的基础知识似乎有效.

因此,第2步是将其用于音频处理而不是音频生成.

不幸的是,我没有取得多大成功.它似乎能够做的最好的是输出一个非常嘈杂的输入版本.

作为"健全性检查",我已经测试了输入和输出使用相同的音频文件,我希望它能够快速收敛到模型,只需复制输入.但事实并非如此.再次,经过长时间的训练,它似乎能够做的就是产生一个噪音更大的输入版本.

我猜的最相关的代码是DataSetIterator.next()方法(改编自示例的CharacterIterator类),现在看起来像这样:

public DataSet next(int num) {
    if (exampleStartOffsets.size() == 0)
        throw new NoSuchElementException();

    int currMinibatchSize = Math.min(num, exampleStartOffsets.size());
    // Allocate space:
    // Note the order here:
    // dimension 0 = number of examples in minibatch
    // dimension 1 = size of each vector (i.e., number of characters)
    // dimension 2 = length of each time series/example
    // Why 'f' order here? See http://deeplearning4j.org/usingrnns.html#data
    // section "Alternative: Implementing a custom DataSetIterator"
    INDArray input = Nd4j.create(new int[] { currMinibatchSize, columns, exampleLength }, 'f');
    INDArray labels = Nd4j.create(new int[] { currMinibatchSize, columns, exampleLength }, 'f');

    for (int i = 0; i < currMinibatchSize; i++) {
        int startIdx = exampleStartOffsets.removeFirst();
        int endIdx = startIdx + exampleLength;

        for (int j = startIdx, c = 0; j < endIdx; j++, c++) {
            // inputIndices/idealIndices are audio samples converted to indices.
            // With 8-bit audio, this translates to values between 0-255.
            input.putScalar(new int[] { i, inputIndices[j], c }, 1.0);
            labels.putScalar(new int[] { i, idealIndices[j], c }, 1.0);
        }
    }

    return new DataSet(input, labels);
}

Run Code Online (Sandbox Code Playgroud)

所以也许我对LSTM应该做什么有一个基本的误解.发布的代码中有什么明显错误我错过了吗？是否有一个明显的原因,为什么同一文件的培训不一定快速收敛到只复制输入的模型？(更不用说甚至试图在实际做某事的信号处理上进行训练？)

我已经看到使用RNN从噪声信号恢复正弦波,这似乎是一个类似的问题(但使用不同的ML框架),但这没有得到答案.

任何反馈表示赞赏!

Answer 1

And*_*oud -1

你好，我认为在数据集的逻辑中尝试使用长类型而不是整数

public DataSet next(int num)

Run Code Online (Sandbox Code Playgroud)

替换为

public DataSet next(long num)

Run Code Online (Sandbox Code Playgroud)

归档时间：	8 年，8 月前
查看次数：	913 次
最近记录：	8 年，7 月前