我正在做:
import librosa
D = librosa.stft(samples, n_fft=nperseg,
hop_length=overlap, win_length=nperseg,
window=scipy.signal.windows.hamming)
spect, _ = librosa.magphase(D)
audio_signal = librosa.griffinlim(spect, n_iter=1024,
win_length=nperseg, hop_length=overlap,
window=signal.windows.hamming)
print(audio_signal, audio_signal.shape)
sf.write('test.wav', audio_signal, sample_rate)
Run Code Online (Sandbox Code Playgroud)
并且它在重建的音频信号中引入了明显的失真。我能做些什么来改善它?
当我尝试使用与 librosa 模块相关的任何内容时,出现错误:
Traceback (most recent call last):
File "C:\Users\User1\Documents\test3.py", line 36, in <module>
x, Fs = librosa.load(fn_mp3, sr=None)
File "C:\Program Files\Python38\lib\site-packages\librosa\core\audio.py", line 129, in load
with sf.SoundFile(path) as sf_desc:
File "C:\Program Files\Python38\lib\site-packages\soundfile.py", line 629, in __init__
self._file = self._open(file, mode_int, closefd)
File "C:\Program Files\Python38\lib\site-packages\soundfile.py", line 1172, in _open
openfunction = _snd.sf_wchar_open
AttributeError: cffi library 'C:\Program Files\Python38\lib\site-packages\_soundfile_data\libsndfile64bit.dll' has no function, constant or global variable named 'sf_wchar_open'
Run Code Online (Sandbox Code Playgroud)
在出现错误之前,我libsndfile64bit.dll在站点包中创建了一个名为_soundfile_data的文件夹,并libsndfile64bit.dll从此处下载,然后将其添加到该文件夹中,然后我提供的错误弹出。我曾尝试在 SO 上搜索答案,但没有相关问题,我无法编辑,libsndfile64bit.dll因此我无能为力。我使用的是 Windows …
在遇到错误之前尝试从 NVIDIA TTS nemo 模型生成的张量生成音频:
这是它的代码:
import soundfile as sf
from nemo.collections.tts.models import FastPitchModel
from nemo.collections.tts.models import HifiGanModel
spec_generator = FastPitchModel.from_pretrained("tts_en_fastpitch")
vocoder = HifiGanModel.from_pretrained(model_name="tts_hifigan")
text = "Just keep being true to yourself, if you're passionate about something go for it. Don't sacrifice anything, just have fun."
parsed = spec_generator.parse(text)
spectrogram = spec_generator.generate_spectrogram(tokens=parsed)
audio = vocoder.convert_spectrogram_to_audio(spec=spectrogram)
audio = audio.to('cpu').detach().numpy()
sf.write("speech.wav", audio, 22050)
Run Code Online (Sandbox Code Playgroud)
期望获得音频文件speech.wav