我是 Python 新手,我正在尝试训练我的音频语音识别模型。我想读取 .wav 文件并将该 .wav 文件的输出放入 Numpy 数组中。我怎样才能做到这一点?
小智 6
根据 @Marco 的评论,您可以查看 Scipy 库,特别是scipy.io.
from scipy.io import wavfile
Run Code Online (Sandbox Code Playgroud)
要读取您的文件('filename.wav'),只需执行以下操作
output = wavfile.read('filename.wav')
Run Code Online (Sandbox Code Playgroud)
这将输出一个元组(我将其命名为“输出”):
output[0], 采样率output[1],您要分析的样本数组 这可以通过几行wave(内置)和numpy(显然)来实现。您不需要使用librosa,scipy或soundfile。最新的版本给我读取wav文件带来了问题,这就是我现在在这里写的全部原因。
import numpy as np
import wave
# Start opening the file with wave
with wave.open('filename.wav') as f:
# Read the whole file into a buffer. If you are dealing with a large file
# then you should read it in blocks and process them separately.
buffer = f.readframes(f.getnframes())
# Convert the buffer to a numpy array by checking the size of the sample
# width in bytes. The output will be a 1D array with interleaved channels.
interleaved = np.frombuffer(buffer, dtype=f'int{f.getsampwidth()*8}')
# Reshape it into a 2D array separating the channels in columns.
data = np.reshape(interleaved, (-1, f.getnchannels()))
Run Code Online (Sandbox Code Playgroud)
我喜欢将其打包到一个返回采样频率并与pathlib.Path对象一起使用的函数中。这样就可以使用以下方式进行播放sounddevice
# play_wav.py
import sounddevice as sd
import numpy as np
import wave
from typing import Tuple
from pathlib import Path
# Utility function that reads the whole `wav` file content into a numpy array
def wave_read(filename: Path) -> Tuple[np.ndarray, int]:
with wave.open(str(filename), 'rb') as f:
buffer = f.readframes(f.getnframes())
inter = np.frombuffer(buffer, dtype=f'int{f.getsampwidth()*8}')
return np.reshape(inter, (-1, f.getnchannels())), f.getframerate()
if __name__ == '__main__':
# Play all files in the current directory
for wav_file in Path().glob('*.wav'):
print(f"Playing {wav_file}")
data, fs = wave_read(wav_file)
sd.play(data, samplerate=fs, blocking=True)
Run Code Online (Sandbox Code Playgroud)