Python Librosa：用于计算MFCC功能的默认帧大小是多少？

Question

Python Librosa：用于计算MFCC功能的默认帧大小是多少？

使用Librosa库，我将音频文件1319秒的MFCC功能生成到一个矩阵中20 X 56829。在20这里表示MFCC的没有特色（我可以手动进行调整）。但我不知道它是如何将音频长度分割为的56829。处理音频需要多少帧大小？

import numpy as np
import matplotlib.pyplot as plt
import librosa

def getPathToGroundtruth(episode):
    """Return path to groundtruth file for episode"""
    pathToGroundtruth = "../../../season01/Audio/" \
                        + "Season01.Episode%02d.en.wav" % episode
    return pathToGroundtruth

def getduration(episode):
    pathToAudioFile = getPathToGroundtruth(episode)
    y, sr = librosa.load(pathToAudioFile)
    duration = librosa.get_duration(y=y, sr=sr)
    return duration
def getMFCC(episode):
    filename = getPathToGroundtruth(episode)
    y, sr = librosa.load(filename)  # Y gives 
    data = librosa.feature.mfcc(y=y, sr=sr)
    return data


data = getMFCC(1)

Run Code Online (Sandbox Code Playgroud)

Answer 1

Rya*_*n M 10

简短答案

您可以通过更改stft计算中使用的参数来指定更改长度。以下代码将使输出大小增加一倍（20 x 113658）

data = librosa.feature.mfcc(y=y, sr=sr, n_fft=1012, hop_length=256, n_mfcc=20)

Run Code Online (Sandbox Code Playgroud)

长答案

Librosa的librosa.feature.mfcc()函数实际上只是充当librosa的函数的包装器（librosa的librosa.feature.melspectrogram()函数是librosa.core.stftand librosa.filters.mel函数的包装器）。

与音频信号分段有关的所有参数（即帧和重叠值）都在梅尔缩放功率谱图函数中指定（对于嵌套核心函数指定了其他可调参数）。您可以在librosa.feature.mfcc()函数中将这些参数指定为关键字参数。

所有额外的**kwargs参数将被馈librosa.feature.melspectrogram()送给librosa.filters.mel()

默认情况下，梅尔级功率谱图窗口和跳长如下：

n_fft=2048

hop_length=512

因此，假设您使用默认的采样率（sr=22050），则mfcc函数的输出很有意义：

输出长度= （秒）*（采样率）/（hop_length）

（1319）*（22050）/（512） = 56804个样本

您可以调整的参数如下：

Melspectrogram Parameters
-------------------------
y : np.ndarray [shape=(n,)] or None
    audio time-series

sr : number > 0 [scalar]
    sampling rate of `y`

S : np.ndarray [shape=(d, t)]
    power spectrogram

n_fft : int > 0 [scalar]
    length of the FFT window

hop_length : int > 0 [scalar]
    number of samples between successive frames.
    See `librosa.core.stft`

kwargs : additional keyword arguments
  Mel filter bank parameters.
  See `librosa.filters.mel` for details.

Run Code Online (Sandbox Code Playgroud)

如果要进一步指定用于定义梅尔标度功率谱图的梅尔滤波器组的特性，可以调整以下内容

Mel Frequency Parameters
------------------------
sr        : number > 0 [scalar]
    sampling rate of the incoming signal

n_fft     : int > 0 [scalar]
    number of FFT components

n_mels    : int > 0 [scalar]
    number of Mel bands to generate

fmin      : float >= 0 [scalar]
    lowest frequency (in Hz)

fmax      : float >= 0 [scalar]
    highest frequency (in Hz).
    If `None`, use `fmax = sr / 2.0`

htk       : bool [scalar]
    use HTK formula instead of Slaney

Run Code Online (Sandbox Code Playgroud)

Librosa的文档：

librosa.feature.melspectrogram

librosa.filters.mel

librosa.core.stft

归档时间：	9 年，8 月前
查看次数：	5613 次
最近记录：	9 年，7 月前