P0s*_*ive 1 python pytorch google-colaboratory openai-whisper large-language-model
我正在使用 V100 GPU、高 RAM 模式在 google colab 上开发一个 LLM 项目,这些是我的依赖项:
git+https://github.com/pyannote/pyannote-audio
git+https://github.com/huggingface/transformers.git@v4.34.1
openai==0.28
ffmpeg-python
pandas==1.5.0
tokenizers==0.14
torch==2.1.1
torchaudio==2.1.1
tqdm==4.64.1
EasyNMT==2.0.2
psutil==5.9.2
requests
pydub
docxtpl
faster-whisper==0.10.0
git+https://github.com/openai/whisper.git
Run Code Online (Sandbox Code Playgroud)
这是我导入的所有内容:
from faster_whisper import WhisperModel
from datetime import datetime, timedelta
from time import time
from pathlib import Path
import pandas as pd
import os
from pydub import AudioSegment
import numpy as np
from sklearn.cluster import AgglomerativeClustering
from sklearn.metrics import silhouette_score
import requests
import torch
import pyannote.audio
from pyannote.audio.pipelines.speaker_verification import PretrainedSpeakerEmbedding
from pyannote.audio import Audio
from pyannote.core import Segment
import wave
import contextlib
import psutil
import openai
from codecs import decode
from docxtpl import DocxTemplate
Run Code Online (Sandbox Code Playgroud)
我曾经在最新版本中使用 torch 和 torchaudio,但他们昨天得到了更新(2023 年 12 月 15 日,v2.1.2 发布)。我认为我收到的错误是由更新引起的,所以我将它们固定到我的代码在 2 天前运行的版本 (v2.1.1)。显然,这不起作用。
两天前一切正常,我没有更改笔记本中的任何内容。唯一可能发生变化的是我正在使用的依赖项,但使用以前的版本并没有解决我的问题。这是引发错误的代码片段:
def EETDT(audio_path, whisper_model, num_speakers, output_name="diarization_result", selected_source_lang="eng", transcript=None):
"""
Uses Whisper to seperate audio into segments and generate transcripts.
segment.
Speech Recognition is based on models from OpenAI Whisper https://github.com/openai/whisper
Speaker diarization model and pipeline from by https://github.com/pyannote/pyannote-audio
audio_path : str -> path to wav file
whisper_model : str -> small/medium/large/large-v2/large-v3
num_speakers : int -> number of speakers in audio (0 to let the function determine it)
output_name : str -> Desired name of the output file
selected_source_lang : str -> language's code
"""
audio_name = audio_path.split("/")[-1].split(".")[0]
model = WhisperModel(whisper_model, compute_type="int8")
time_start = time()
if(audio_path == None):
raise ValueError("Error no video input")
print("Input file:", audio_path)
if not audio_path.endswith(".wav"):
print("Submitted audio isn't in wav format. Starting conversion...")
audio = AudioSegment.from_file(audio_path)
audio_suffix = audio_path.split(".")[-1]
new_path = audio_path.replace(audio_suffix,"wav")
audio.export(new_path, format="wav")
audio_path = new_path
print("Converted to wav:", new_path)
try:
# Get duration
with contextlib.closing(wave.open(audio_path,'r')) as f:
frames = f.getnframes()
rate = f.getframerate()
duration = frames / float(rate)
if duration<30:
raise ValueError(f"Audio has to be longer than 30 seconds. Current: {duration}")
print(f"Duration of audio file: {duration}")
# Transcribe audio
options = dict(language=selected_source_lang, beam_size=5, best_of=5)
transcribe_options = dict(task="transcribe", **options)
segments_raw, info = model.transcribe(audio_path, **transcribe_options)
# Convert back to original openai format
segments = []
i = 0
full_transcript = list()
if type(transcript) != type(pd.DataFrame()):
for segment_chunk in segments_raw: # <-- THROWS ERROR
chunk = {}
chunk["start"] = segment_chunk.start
chunk["end"] = segment_chunk.end
chunk["text"] = segment_chunk.text
full_transcript.append(segment_chunk.text)
segments.append(chunk)
i += 1
full_transcript = "".join(full_transcript)
print("Transcribe audio done with fast-whisper")
else:
for i in range(len(transcript)):
full_transcript.append(transcript["text"].iloc[i])
full_transcript = "".join(full_transcript)
print("You inputted pre-transcribed audio")
except Exception as e:
raise RuntimeError("Error converting video to audio")
...The code never leaves the try block...
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
1999 次 |
| 最近记录: |