从音频到张量，再回到 tensorflow 中的音频

Question

从音频到张量，再回到 tensorflow 中的音频

有没有办法直接将音频文件（wav）加载到张量流中的张量？然后，再次将张量转换为音频文件？我看到有些人将音频转换为频谱图，但我找不到任何人可以将频谱图转换为音频。

Answer 1

TensorFlow 1.x：

该tf.contrib.ffmpeg.decode_audio()运算可以加载的音频数据（包括WAV格式）转换成张量，并且tf.contrib.ffmpeg.encode_audio()可以隐蔽它放回音频数据。

input_filename = tf.placeholder(tf.string, shape=[])
output_filename = tf.placeholder(tf.string, shape=[])

input_signal = tf.contrib.ffmpeg.decode_audio(
    tf.read_file(input_filename), file_format="wav",
    samples_per_second=44100, channel_count=2)

# ...

output_signal = ...  # A 2-D tensor, [samples x channels]
encoded_audio_data = tf.contrib.ffmpeg.encode_audio(
    output_signal, file_format="wav", samples_per_second=44100)

write_file_op = tf.write_file(output_filename, encoded_audio_data)

with tf.Session() as sess:
  sess.run(write_file_op, {input_filename: "input.wav",
                           output_filename: "output.wav"})

Run Code Online (Sandbox Code Playgroud)

TensorFlow 2.x

该tf.contrib模块已被弃用，但您仍然可以使用 Eager Execution 以 16 位 PCM WAV 格式加载和保存音频文件，并且tf.audio：

input_filename = tf.placeholder(tf.string, shape=[])
output_filename = tf.placeholder(tf.string, shape=[])

input_signal = tf.contrib.ffmpeg.decode_audio(
    tf.read_file(input_filename), file_format="wav",
    samples_per_second=44100, channel_count=2)

# ...

output_signal = ...  # A 2-D tensor, [samples x channels]
encoded_audio_data = tf.contrib.ffmpeg.encode_audio(
    output_signal, file_format="wav", samples_per_second=44100)

write_file_op = tf.write_file(output_filename, encoded_audio_data)

with tf.Session() as sess:
  sess.run(write_file_op, {input_filename: "input.wav",
                           output_filename: "output.wav"})

Run Code Online (Sandbox Code Playgroud)

归档时间：	7 年，9 月前
查看次数：	2095 次
最近记录：	5 年，11 月前