bee*_*ero 7 python audio split ffmpeg batch-file
我有超过200个MP3文件,我需要通过使用静音检测来分割它们.我试过Audacity和WavePad,但他们没有批处理过程,而且逐个制作它们的速度很慢.
方案如下:
我试过FFmpeg但没有成功.
Ani*_*l_M 15
我发现pydub是最简单的工具,可以用简单的方式和最少量的代码进行这种音频操作.
你可以安装pydub作为
pip install pydub
Run Code Online (Sandbox Code Playgroud)
如果需要,您可能需要安装ffmpeg/avlib.查看链接了解更多详情.
这是一段代码,可以满足您的要求.一些参数,如silence_threshold,target_dBFS可能需要一些调整,以符合您的要求.
总的来说,我能够分割mp3文件(虽然silence_threshold在我的结尾不同)
工作守则
# Import the AudioSegment class for processing audio and the
# split_on_silence function for separating out silent chunks.
from pydub import AudioSegment
from pydub.silence import split_on_silence
# Define a function to normalize a chunk to a target amplitude.
def match_target_amplitude(aChunk, target_dBFS):
''' Normalize given audio chunk '''
change_in_dBFS = target_dBFS - aChunk.dBFS
return aChunk.apply_gain(change_in_dBFS)
# Load your audio.
song = AudioSegment.from_mp3("your_audio.mp3")
# Split track where the silence is 2 seconds or more and get chunks using
# the imported function.
chunks = split_on_silence (
# Use the loaded audio.
song,
# Specify that a silent chunk must be at least 2 seconds or 2000 ms long.
min_silence_len = 2000,
# Consider a chunk silent if it's quieter than -16 dBFS.
# (You may want to adjust this parameter.)
silence_thresh = -16
)
# Process each chunk with your parameters
for i, chunk in enumerate(chunks):
# Create a silence chunk that's 0.5 seconds (or 500 ms) long for padding.
silence_chunk = AudioSegment.silent(duration=500)
# Add the padding chunk to beginning and end of the entire chunk.
audio_chunk = silence_chunk + chunk + silence_chunk
# Normalize the entire chunk.
normalized_chunk = match_target_amplitude(audio_chunk, -20.0)
# Export the audio chunk with new bitrate.
print("Exporting chunk{0}.mp3.".format(i))
normalized_chunk.export(
".//chunk{0}.mp3".format(i),
bitrate = "192k",
format = "mp3"
)
Run Code Online (Sandbox Code Playgroud)
如果您的原始音频是立体声(通道= 2),则块将是立体声.您可以查看如下.
>>> song.channels
2
Run Code Online (Sandbox Code Playgroud)
在测试了所有这些解决方案后,没有一个对我有用,我找到了一个对我有用并且相对较快的解决方案。
先决条件:
ffmpegnumpy(尽管它不需要 numpy 的太多,并且不需要 numpy 的解决方案numpy可能相对容易编写并进一步提高速度)运作方式、理由:
ffmpeg将输入转换为无损 16 位 22kHz PCM 并通过 传回subprocess.Popen,其优点是ffmpeg转换速度非常快,并且以小块形式传输,不占用太多内存。numpy最后一个和最后一个缓冲区之前的2个临时数组被连接起来并检查它们是否超过给定的阈值。如果他们不这样做,则意味着存在一段沉默,并且(我天真地承认)只需计算“沉默”的时间即可。如果时间至少与给定的分钟一样长。沉默持续时间,(再次天真地)当前间隔的中间被视为分裂时刻。ffmpeg获取由这些“沉默”界定的段并将它们保存到单独的文件中。小代码:
import subprocess as sp
import sys
import numpy
FFMPEG_BIN = "ffmpeg.exe"
print 'ASplit.py <src.mp3> <silence duration in seconds> <threshold amplitude 0.0 .. 1.0>'
src = sys.argv[1]
dur = float(sys.argv[2])
thr = int(float(sys.argv[3]) * 65535)
f = open('%s-out.bat' % src, 'wb')
tmprate = 22050
len2 = dur * tmprate
buflen = int(len2 * 2)
# t * rate * 16 bits
oarr = numpy.arange(1, dtype='int16')
# just a dummy array for the first chunk
command = [ FFMPEG_BIN,
'-i', src,
'-f', 's16le',
'-acodec', 'pcm_s16le',
'-ar', str(tmprate), # ouput sampling rate
'-ac', '1', # '1' for mono
'-'] # - output to stdout
pipe = sp.Popen(command, stdout=sp.PIPE, bufsize=10**8)
tf = True
pos = 0
opos = 0
part = 0
while tf :
raw = pipe.stdout.read(buflen)
if raw == '' :
tf = False
break
arr = numpy.fromstring(raw, dtype = "int16")
rng = numpy.concatenate([oarr, arr])
mx = numpy.amax(rng)
if mx <= thr :
# the peak in this range is less than the threshold value
trng = (rng <= thr) * 1
# effectively a pass filter with all samples <= thr set to 0 and > thr set to 1
sm = numpy.sum(trng)
# i.e. simply (naively) check how many 1's there were
if sm >= len2 :
part += 1
apos = pos + dur * 0.5
print mx, sm, len2, apos
f.write('ffmpeg -i "%s" -ss %f -to %f -c copy -y "%s-p%04d.mp3"\r\n' % (src, opos, apos, src, part))
opos = apos
pos += dur
oarr = arr
part += 1
f.write('ffmpeg -i "%s" -ss %f -to %f -c copy -y "%s-p%04d.mp3"\r\n' % (src, opos, pos, src, part))
f.close()
Run Code Online (Sandbox Code Playgroud)
您可以尝试使用它在静音时分割音频,而无需探索静音阈值的可能性
def split(filepath):
sound = AudioSegment.from_wav(filepath)
dBFS = sound.dBFS
chunks = split_on_silence(sound,
min_silence_len = 500,
silence_thresh = dBFS-16,
keep_silence = 250 //optional
)
Run Code Online (Sandbox Code Playgroud)
请注意,使用此选项后无需调整 silent_thresh 值。
另外,如果你想通过设置音频块的最小长度来分割音频,你可以在上面提到的代码之后添加它。
target_length = 25 * 1000 //setting minimum length of each chunk to 25 seconds
output_chunks = [chunks[0]]
for chunk in chunks[1:]:
if len(output_chunks[-1]) < target_length:
output_chunks[-1] += chunk
else:
# if the last output chunk is longer than the target length,
# we can start a new one
output_chunks.append(chunk)
Run Code Online (Sandbox Code Playgroud)
现在我们使用 output_chunks 进行进一步处理
| 归档时间: |
|
| 查看次数: |
10067 次 |
| 最近记录: |