使用youtube-dl直接下载到S3

Question

使用youtube-dl直接下载到S3

我正在尝试编写一个 lambda 函数（最终）将的输出youtube-dl直接写入 S3。我必须对此进行原型设计的方法是将输出转储stdout到 S3 中，然后将其重定向到 S3 中的文件，但这似乎是一个巨大的黑客攻击。

import youtube_dl, sys
from smart_open import open
from contextlib import redirect_stdout

ydl_opts = { 'outtmpl': '-', 'cachedir': False, 'logtostderr': True }
with youtube_dl.YoutubeDL(ydl_opts) as ydl:
  with open('s3://<test-bucket>/test.mp4','wb') as f:
    with redirect_stdout(f): 
      ydl.download(['https://www.youtube.com/watch?v=zzEYsi_P0RU'])

Run Code Online (Sandbox Code Playgroud)

还有其他人有什么神奇的想法吗？

Answer 1

Ham*_*ywa 0

您可以使用 Nodejs 流来执行此操作，只要您可以在最多 15 分钟内完成任务，否则 lambda 任务将超时。

在 Node.js 中创建 PassThrough 流。传递流是一种双工流，您可以在一侧写入并在另一侧读取。

   const stream = require('stream');
   const passthrough = new stream.PassThrough();

Run Code Online (Sandbox Code Playgroud)

使用 youtube-dl 库将数据写入流。

   const youtubedl = require('youtube-dl');
   const dl = youtubedl(event.videoUrl, ['--format=best[ext=mp4]'], 
              {maxBuffer: Infinity});
   dl.pipe(passtrough); // write video to the pass-through stream

Run Code Online (Sandbox Code Playgroud)

最后，将流上传到S3。您可以利用 S3 的分段上传功能，该功能允许我们以较小的块上传大文件。这意味着您只需在内存中缓冲小垃圾（在本例中为 64 MB），而不是整个文件。

  const AWS = require('aws-sdk');
  const upload = new AWS.S3.ManagedUpload({
    params: {
      Bucket: process.env.BUCKET_NAME,
      Key: 'video.mp4',
      Body: passtrough
    },
   partSize: 1024 * 1024 * 64 // 64 MB in bytes
  });
 upload.send((err) => {
    if (err) {
      console.log('error', err);
    } else {
     console.log('done');
   }
  });

Run Code Online (Sandbox Code Playgroud)

尝试为 lambda 提供足够的资源（例如大量内存）以获得更好的网络性能。

归档时间：	6 年，1 月前
查看次数：	1198 次
最近记录：	2 年，5 月前