小编Jac*_*iti的帖子

获取 soundfile.LibsndfileError：打开“speech.wav”时出错：将 2D numpy 数组提供给声音文件时无法识别格式

在遇到错误之前尝试从 NVIDIA TTS nemo 模型生成的张量生成音频：

这是它的代码：

import soundfile as sf

from nemo.collections.tts.models import FastPitchModel
from nemo.collections.tts.models import HifiGanModel

spec_generator = FastPitchModel.from_pretrained("tts_en_fastpitch")
vocoder = HifiGanModel.from_pretrained(model_name="tts_hifigan")

text = "Just keep being true to yourself, if you're passionate about something go for it. Don't sacrifice anything, just have fun."
parsed = spec_generator.parse(text)
spectrogram = spec_generator.generate_spectrogram(tokens=parsed)
audio = vocoder.convert_spectrogram_to_audio(spec=spectrogram)
audio = audio.to('cpu').detach().numpy()

sf.write("speech.wav", audio, 22050)

Run Code Online (Sandbox Code Playgroud)

期望获得音频文件speech.wav

libsndfile python-3.x soundfile

Jac*_*iti

lucky-day

3
推荐指数

1
解决办法

1万
查看次数