Sha*_*oon 13 python pytorch torchaudio
我有一个MelSpectrogram
生成自:
eval_seq_specgram = torchaudio.transforms.MelSpectrogram(sample_rate=sample_rate, n_fft=256)(eval_audio_data).transpose(1, 2)
Run Code Online (Sandbox Code Playgroud)
所以eval_seq_specgram
现在有一个size
of torch.Size([1, 128, 499])
,其中 499 是时间步数,128 是n_mels
.
我正在尝试反转它,所以我正在尝试使用GriffinLim
,但在此之前,我想我需要反转melscale
,所以我有:
inverse_mel_pred = torchaudio.transforms.InverseMelScale(sample_rate=sample_rate, n_stft=256)(eval_seq_specgram)
Run Code Online (Sandbox Code Playgroud)
inverse_mel_pred
拥有size
的torch.Size([1, 256, 499])
然后我尝试使用GriffinLim
:
pred_audio = torchaudio.transforms.GriffinLim(n_fft=256)(inverse_mel_pred)
Run Code Online (Sandbox Code Playgroud)
但我收到一个错误:
Traceback (most recent call last):
File "evaluate_spect.py", line 63, in <module>
main()
File "evaluate_spect.py", line 51, in main
pred_audio = torchaudio.transforms.GriffinLim(n_fft=256)(inverse_mel_pred)
File "/home/shamoon/.local/share/virtualenvs/speech-reconstruction-7HMT9fTW/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/shamoon/.local/share/virtualenvs/speech-reconstruction-7HMT9fTW/lib/python3.8/site-packages/torchaudio/transforms.py", line 169, in forward
return F.griffinlim(specgram, self.window, self.n_fft, self.hop_length, self.win_length, self.power,
File "/home/shamoon/.local/share/virtualenvs/speech-reconstruction-7HMT9fTW/lib/python3.8/site-packages/torchaudio/functional.py", line 179, in griffinlim
inverse = torch.istft(specgram * angles,
RuntimeError: The size of tensor a (256) must match the size of tensor b (129) at non-singleton dimension 1
Run Code Online (Sandbox Code Playgroud)
不知道我做错了什么或如何解决这个问题。
小智 3
仅用于历史,完整代码:
import torch
import torchaudio
import IPython
waveform, sample_rate = torchaudio.load("wavs/LJ030-0196.wav", normalize=True)
n_fft = 256
n_stft = int((n_fft//2) + 1)
transofrm = torchaudio.transforms.MelSpectrogram(sample_rate, n_fft=n_fft)
invers_transform = torchaudio.transforms.InverseMelScale(sample_rate=sample_rate, n_stft=n_stft)
grifflim_transform = torchaudio.transforms.GriffinLim(n_fft=n_fft)
mel_specgram = transofrm(waveform)
inverse_waveform = invers_transform(mel_specgram)
pseudo_waveform = grifflim_transform(inverse_waveform)
Run Code Online (Sandbox Code Playgroud)
和
IPython.display.Audio(waveform.numpy(), rate=sample_rate)
Run Code Online (Sandbox Code Playgroud)
IPython.display.Audio(pseudo_waveform.numpy(), rate=sample_rate)
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
723 次 |
最近记录: |