Python中的音高检测

Question

Python中的音高检测

And*_*vus 5 python speech-recognition signal-processing speech speech-to-text

我正在研究的程序的概念是一个Python模块，该模块可以检测某些频率（人类语音频率80-300hz），并通过从数据库中进行检查来显示句子的语调。我使用SciPy绘制声音文件的频率，但是我无法设置任何特定频率来分析音高。我怎样才能做到这一点？

更多信息：我希望能够在语音中设置定义的模式（例如，上升，下降），并且程序将检测声音文件是否遵循特定的模式。

Answer 1

Nik*_*rev 14

2019 年更新，现在有基于神经网络的非常准确的音高跟踪器。他们在 Python 中开箱即用。查看

https://pypi.org/project/crepe/

ANSWER FROM 2015. 音高检测是一个复杂的问题，最新的谷歌包为这个非平凡的任务提供了高度智能的解决方案：

https://github.com/google/REAPER

如果你想从 Python 访问它，你可以用 Python 包装它。

Answer 2

Sah*_*l M 5

您可以尝试以下方法。我敢肯定，您知道人类的声音也会产生谐波，频率超过300 Hz。不过，您可以在音频文件中移动一个窗口，并尝试查看最大功率变化（如下所示）或窗口中的一组频率。下面的代码用于给出直觉：

import scipy.fftpack as sf
import numpy as np
def maxFrequency(X, F_sample, Low_cutoff=80, High_cutoff= 300):
        """ Searching presence of frequencies on a real signal using FFT
        Inputs
        =======
        X: 1-D numpy array, the real time domain audio signal (single channel time series)
        Low_cutoff: float, frequency components below this frequency will not pass the filter (physical frequency in unit of Hz)
        High_cutoff: float, frequency components above this frequency will not pass the filter (physical frequency in unit of Hz)
        F_sample: float, the sampling frequency of the signal (physical frequency in unit of Hz)
        """        

        M = X.size # let M be the length of the time series
        Spectrum = sf.rfft(X, n=M) 
        [Low_cutoff, High_cutoff, F_sample] = map(float, [Low_cutoff, High_cutoff, F_sample])

        #Convert cutoff frequencies into points on spectrum
        [Low_point, High_point] = map(lambda F: F/F_sample * M, [Low_cutoff, High_cutoff])

        maximumFrequency = np.where(Spectrum == np.max(Spectrum[Low_point : High_point])) # Calculating which frequency has max power.

        return maximumFrequency

voiceVector = []
for window in fullAudio: # Run a window of appropriate length across the audio file
    voiceVector.append (maxFrequency( window, samplingRate))

Run Code Online (Sandbox Code Playgroud)

现在，基于语音的语调，最大功率频率可能会发生变化，您可以注册并映射到给定的语调。不一定总是如此，您可能必须同时监视很多频率的变化，但这应该可以帮助您入门。

Answer 3

Bai*_* Li 5

有许多不同的算法来估计音高，但一项研究发现 Praat 的算法是最准确的 [1]。最近，该 Parselmouth库使从 Python 调用 Praat 函数变得更加容易 [2]。

[1]：Strömbergsson，索非亚。“当今最常用的 F0 估计方法，以及它们在干净语音中估计男性和女性音高的准确性。” 演讲间。2016. https://pdfs.semanticscholar.org/ff04/0316f44eab5c0497cec280bfb1fd0e7c0e85.pdf

[2]：https : //github.com/YannickJadoul/Parselmouth

归档时间：	10 年，4 月前
查看次数：	11977 次
最近记录：	6 年，6 月前