我想解析srt字幕:
1
00:00:12,815 --> 00:00:14,509
Chlapi, jak to jde s
t?ma pracovníma sv?tlama?.
2
00:00:14,815 --> 00:00:16,498
Trochu je zesilujeme.
3
00:00:16,934 --> 00:00:17,814
Jo, sleduj.
Run Code Online (Sandbox Code Playgroud)
每个项目都进入结构.有了这个正则表达式:
A:
RE_ITEM = re.compile(r'(?P<index>\d+).'
r'(?P<start>\d{2}:\d{2}:\d{2},\d{3}) --> '
r'(?P<end>\d{2}:\d{2}:\d{2},\d{3}).'
r'(?P<text>.*?)', re.DOTALL)
Run Code Online (Sandbox Code Playgroud)
B:
RE_ITEM = re.compile(r'(?P<index>\d+).'
r'(?P<start>\d{2}:\d{2}:\d{2},\d{3}) --> '
r'(?P<end>\d{2}:\d{2}:\d{2},\d{3}).'
r'(?P<text>.*)', re.DOTALL)
Run Code Online (Sandbox Code Playgroud)
这段代码:
for i in Subtitles.RE_ITEM.finditer(text):
result.append((i.group('index'), i.group('start'),
i.group('end'), i.group('text')))
Run Code Online (Sandbox Code Playgroud)
使用代码BI只有一个项目在数组中(因为贪婪.*)和代码AI有空的'文本',因为没有贪心.*?
怎么治这个?
谢谢