解析和转换TED会谈JSON字幕

Ali*_*xel 11 string video parsing json subtitle

这个问题与另一个问题@SuperUser有关.

我想下载TED演讲和相应的字幕以供离线观看,例如让我们采取Richard St. John的简短演讲,高分辨率视频下载URL如下:

http://www.ted.com/talks/download/video/5118/talk/70

各个JSON编码的英文字幕可以在以下位置下载:

http://www.ted.com/talks/subtitles/id/70/lang/eng

这是从实际字幕的开头除外:

{"captions":[{"content":"This is really a two hour presentation I give to high school students,","startTime":0,"duration":3000,"startOfParagraph":false},{"content":"cut down to three minutes.","startTime":3000,"duration":1000,"startOfParagraph":false},{"content":"And it all started one day on a plane, on my way to TED,","startTime":4000,"duration":3000,"startOfParagraph":false},{"content":"seven years ago."
Run Code Online (Sandbox Code Playgroud)

从副标题的结尾:

{"content":"Or failing that, do the eight things -- and trust me,","startTime":177000,"duration":3000,"startOfParagraph":false},{"content":"these are the big eight things that lead to success.","startTime":180000,"duration":4000,"startOfParagraph":false},{"content":"Thank you TED-sters for all your interviews!","startTime":184000,"duration":2000,"startOfParagraph":false}]}
Run Code Online (Sandbox Code Playgroud)

我想写一个应用程序,自动下载视频的高分辨率版本和所有可用的字幕,但我真的很难,因为我必须将字幕转换为(VLC或任何其他体面的视频播放器)兼容格式(.srt或.sub是我的第一选择),我不知道JSON文件的startTimeduration键代表什么.

到目前为止我所知道的是:

  • 下载的视频持续3分30秒,有29帧FPS = 6090帧.
  • startTime从0开始,其中duration3000 = 3000
  • startTime结束于184000,其中duration2000 = 186000

也可能值得注意以下Javascript代码段:

introDuration:16500,
adDuration:4000,
postAdDuration:2000,
Run Code Online (Sandbox Code Playgroud)

所以我的问题是,我应该将什么逻辑应用于转换startTimeduration值为.srt兼容格式:

1
00:01:30,200 --> 00:01:32,201
MEGA DENG COOPER MINE, INDIA

2
00:01:37,764 --> 00:01:39,039
Watch out, watch out!
Run Code Online (Sandbox Code Playgroud)

或者.sub兼容格式:

{FRAME_FROM}{FRAME_TO}This is really a two hour presentation I give to high school students,
{FRAME_FROM}{FRAME_TO}cut down to three minutes.
Run Code Online (Sandbox Code Playgroud)

任何人都可以帮我解决这个问题吗?


Ninh Bui钉了它,公式如下:

introDuration - adDuration + startTime ... introDuration - adDuration + startTime + duration
Run Code Online (Sandbox Code Playgroud)

这种方法允许我以两种方式直接转换为.srt格式(无需知道长度和FPS):

00:00:12,500 --> 00:00:15,500
This is really a two hour presentation I give to high school students,

00:00:15,500 --> 00:00:16,500
cut down to three minutes.
Run Code Online (Sandbox Code Playgroud)

和:

00:00:00,16500 --> 00:00:00,19500
And it all started one day on a plane, on my way to TED,

00:00:00,19500 --> 00:00:00,20500
seven years ago.
Run Code Online (Sandbox Code Playgroud)

小智 4

我的猜测是,json 中的时间以毫秒表示,例如 1000 = 1 秒。可能有一个主计时器,其中 startTime 指示时间线上字幕应出现的时间,而持续时间可能是字幕应在视觉中保留的时间量。通过除以186000 / 1000 = 186秒= 186 / 60 = 3.1分钟= 3分6秒进一步证实了这一理论。剩下的几秒可能是掌声;-) 有了这些信息,您还应该能够计算出您应该将转换应用到哪个帧到哪个帧,即您已经知道每秒的帧数是多少,所以您需要做的就是相乘获取开始帧的 FPS 开始时间的秒数。结束帧可以通过以下方式获得:(startTime + 持续时间) * fps :-)