从音频到张量,再回到 tensorflow 中的音频

Ric*_*mao 0 python tensorflow

有没有办法直接将音频文件(wav)加载到张量流中的张量?然后,再次将张量转换为音频文件?我看到有些人将音频转换为频谱图,但我找不到任何人可以将频谱图转换为音频。

mrr*_*rry 6

TensorFlow 1.x:

tf.contrib.ffmpeg.decode_audio()运算可以加载的音频数据(包括WAV格式)转换成张量,并且tf.contrib.ffmpeg.encode_audio()可以隐蔽它放回音频数据。

input_filename = tf.placeholder(tf.string, shape=[])
output_filename = tf.placeholder(tf.string, shape=[])

input_signal = tf.contrib.ffmpeg.decode_audio(
    tf.read_file(input_filename), file_format="wav",
    samples_per_second=44100, channel_count=2)

# ...

output_signal = ...  # A 2-D tensor, [samples x channels]
encoded_audio_data = tf.contrib.ffmpeg.encode_audio(
    output_signal, file_format="wav", samples_per_second=44100)

write_file_op = tf.write_file(output_filename, encoded_audio_data)

with tf.Session() as sess:
  sess.run(write_file_op, {input_filename: "input.wav",
                           output_filename: "output.wav"})
Run Code Online (Sandbox Code Playgroud)

TensorFlow 2.x

tf.contrib模块已被弃用,但您仍然可以使用 Eager Execution 以 16 位 PCM WAV 格式加载和保存音频文件,并且tf.audio

input_filename = tf.placeholder(tf.string, shape=[])
output_filename = tf.placeholder(tf.string, shape=[])

input_signal = tf.contrib.ffmpeg.decode_audio(
    tf.read_file(input_filename), file_format="wav",
    samples_per_second=44100, channel_count=2)

# ...

output_signal = ...  # A 2-D tensor, [samples x channels]
encoded_audio_data = tf.contrib.ffmpeg.encode_audio(
    output_signal, file_format="wav", samples_per_second=44100)

write_file_op = tf.write_file(output_filename, encoded_audio_data)

with tf.Session() as sess:
  sess.run(write_file_op, {input_filename: "input.wav",
                           output_filename: "output.wav"})
Run Code Online (Sandbox Code Playgroud)