我认为 Librosa.effect.split 有问题？

Question

我认为 Librosa.effect.split 有问题？

首先，此功能是消除音频的静音。这是官方说明：

https://librosa.github.io/librosa/generated/librosa.effects.split.html

librosa.effects.split(y, top_db=10, *kargs)

将音频信号拆分为非静音间隔。

top_db :number > 0 低于参考的阈值（以分贝为单位）被视为静音

返回： intervals:np.ndarray, shape=(m, 2) interval[i] == (start_i, end_i) 是非静默间隔 i 的开始和结束时间（以样本为单位）。

所以这很简单，对于任何低于 10dB 的声音，将其视为静音并从音频中删除。它将返回一个间隔列表，这些间隔是音频中的非静音段。

所以我做了一个非常简单的例子，结果让我感到困惑：我在这里加载的音频是一个 3 秒的人类谈话，非常正常的谈话。

y, sr = librosa.load(file_list[0]) #load the data
print(y.shape) -> (87495,)

intervals = librosa.effects.split(y, top_db=100)
intervals -> array([[0, 87495]])

#if i change 100 to 10
intervals = librosa.effects.split(y, top_db=10)
intervals -> array([[19456, 23040],
                    [27136, 31232],
                    [55296, 58880],
                    [64512, 67072]])

Run Code Online (Sandbox Code Playgroud)

这怎么可能...

我告诉 librosa，好吧，对于任何低于 100dB 的声音，将其视为静音。在这个设置下，整个音频应该被视为静音，并且根据文档，它应该给我 array[[0,0]] 一些东西......因为在去除静音后，什么都没有了......

但似乎 librosa 返回给我的是沉默部分而不是非沉默部分。

Answer 1

Ahm*_*ssa 2

librosa.effects.split() 它在文档中说它返回一个 numpy 数组，其中包含包含非静音音频的间隔。当然，这些间隔取决于您分配给参数的值top_db。它不返回任何音频，只返回波形非静音片段的起点和终点

在您的情况下，即使您设置 top_db = 100，它也不会将整个音频视为静音，因为它们在使用的文档中声明因此将The reference power. By default, it uses **np.max** and compares to the peak power in the signal.您的 top_db 设置为高于音频中存在的最大值实际上会导致 top_db 不有任何影响。这是一个例子：

import librosa
import numpy as np
import matplotlib.pyplot as plt

# create a hypothetical waveform with 1000 noisy samples and 1000 silent samples
nonsilent = np.random.randint(11, 100, 1000) * 100
silent = np.zeros(1000)
wave = np.concatenate((nonsilent, silent))
# look at it
print(wave.shape)
plt.plot(wave)
plt.show()

# get the noisy interval
non_silent_interval = librosa.effects.split(wave, top_db=0.1, hop_length=1000)
print(non_silent_interval)

# plot only the noisy chunk of the waveform
plt.plot(wave[non_silent_interval[0][0]:non_silent_interval[0][1]])
plt.show()

# now set top_db higher than anything in your audio
non_silent_interval = librosa.effects.split(wave, top_db=1000, hop_length=1000)
print(non_silent_interval)

# and you'll get the entire audio again
plt.plot(wave[non_silent_interval[0][0]:non_silent_interval[0][1]])
plt.show()

Run Code Online (Sandbox Code Playgroud)

您可以看到非静音音频是从 0 到 1000，静音音频是从 1000 到 2000 个样本：

这里它只给出了我们创建的波的噪声块：

这里是 top_db 设置为 1000：

这意味着 librosa 做了它在文档中承诺做的一切。希望这可以帮助。

归档时间：	6 年，2 月前
查看次数：	3109 次
最近记录：	6 年，2 月前