Jac*_*iti 3 libsndfile python-3.x soundfile
在遇到错误之前尝试从 NVIDIA TTS nemo 模型生成的张量生成音频:
这是它的代码:
import soundfile as sf
from nemo.collections.tts.models import FastPitchModel
from nemo.collections.tts.models import HifiGanModel
spec_generator = FastPitchModel.from_pretrained("tts_en_fastpitch")
vocoder = HifiGanModel.from_pretrained(model_name="tts_hifigan")
text = "Just keep being true to yourself, if you're passionate about something go for it. Don't sacrifice anything, just have fun."
parsed = spec_generator.parse(text)
spectrogram = spec_generator.generate_spectrogram(tokens=parsed)
audio = vocoder.convert_spectrogram_to_audio(spec=spectrogram)
audio = audio.to('cpu').detach().numpy()
sf.write("speech.wav", audio, 22050)
Run Code Online (Sandbox Code Playgroud)
期望获得音频文件speech.wav
小智 5
看看你的例子,我发现你的音频形状是(1, 173056).
Based on https://github.com/bastibe/python-soundfile/issues/309 I have converted the audio to 1D array of size 173056 and worked fine.
Used code:
>>> import numpy as np
>>> sf.write("speech.wav", np.ravel(audio), sample_rate)
Run Code Online (Sandbox Code Playgroud)
Regards,
| 归档时间: |
|
| 查看次数: |
11258 次 |
| 最近记录: |