在亚马逊lambda中使用moviepy,scipy和numpy

rou*_*uk1 64 python numpy amazon-web-services aws-lambda

我想使用AWS Lambda功能生成视频.

我按照这里这里的说明进行了操作.

我现在有以下过程来构建我的Lambda函数:

步骤1

触发一个Amazon Linux EC2实例并以root身份运行它:

#! /usr/bin/env bash

# Install the SciPy stack on Amazon Linux and prepare it for AWS Lambda

yum -y update
yum -y groupinstall "Development Tools"
yum -y install blas --enablerepo=epel
yum -y install lapack --enablerepo=epel
yum -y install atlas-sse3-devel --enablerepo=epel
yum -y install Cython --enablerepo=epel
yum -y install python27
yum -y install python27-numpy.x86_64
yum -y install python27-numpy-f2py.x86_64
yum -y install python27-scipy.x86_64

/usr/local/bin/pip install --upgrade pip
mkdir -p /home/ec2-user/stack
/usr/local/bin/pip install moviepy -t /home/ec2-user/stack

cp -R /usr/lib64/python2.7/dist-packages/numpy /home/ec2-user/stack/numpy
cp -R /usr/lib64/python2.7/dist-packages/scipy /home/ec2-user/stack/scipy

tar -czvf stack.tgz /home/ec2-user/stack/*
Run Code Online (Sandbox Code Playgroud)

第2步

我将得到的tarball压缩到我的笔记本电脑上.然后运行此脚本以构建zip存档.

#! /usr/bin/env bash

mkdir tmp
rm lambda.zip
tar -xzf stack.tgz -C tmp

zip -9 lambda.zip process_movie.py
zip -r9 lambda.zip *.ttf
cd tmp/home/ec2-user/stack/
zip -r9 ../../../../lambda.zip *
Run Code Online (Sandbox Code Playgroud)

process_movie.py 脚本目前只是一个测试,看看堆栈是否正常:

def make_movie(event, context):
    import os
    print(os.listdir('.'))
    print(os.listdir('numpy'))
    try:
        import scipy
    except ImportError:
        print('can not import scipy')

    try:
        import numpy
    except ImportError:
        print('can not import numpy')

    try:
        import moviepy
    except ImportError:
        print('can not import moviepy')
Run Code Online (Sandbox Code Playgroud)

第3步

然后我将生成的存档上传到S3作为我的lambda函数的源.当我测试该功能时,我得到以下内容callstack:

START RequestId: 36c62b93-b94f-11e5-9da7-83f24fc4b7ca Version: $LATEST
['tqdm', 'imageio-1.4.egg-info', 'decorator.pyc', 'process_movie.py', 'decorator-4.0.6.dist-info', 'imageio', 'moviepy', 'tqdm-3.4.0.dist-info', 'scipy', 'numpy', 'OpenSans-Regular.ttf', 'decorator.py', 'moviepy-0.2.2.11.egg-info']
['add_newdocs.pyo', 'numarray', '__init__.py', '__config__.pyc', '_import_tools.py', 'setup.pyo', '_import_tools.pyc', 'doc', 'setupscons.py', '__init__.pyc', 'setup.py', 'version.py', 'add_newdocs.py', 'random', 'dual.pyo', 'version.pyo', 'ctypeslib.pyc', 'version.pyc', 'testing', 'dual.pyc', 'polynomial', '__config__.pyo', 'f2py', 'core', 'linalg', 'distutils', 'matlib.pyo', 'tests', 'matlib.pyc', 'setupscons.pyc', 'setup.pyc', 'ctypeslib.py', 'numpy', '__config__.py', 'matrixlib', 'dual.py', 'lib', 'ma', '_import_tools.pyo', 'ctypeslib.pyo', 'add_newdocs.pyc', 'fft', 'matlib.py', 'setupscons.pyo', '__init__.pyo', 'oldnumeric', 'compat']
can not import scipy
'module' object has no attribute 'core': AttributeError
Traceback (most recent call last):
  File "/var/task/process_movie.py", line 91, in make_movie
    import numpy
  File "/var/task/numpy/__init__.py", line 122, in <module>
    from numpy.__config__ import show as show_config
  File "/var/task/numpy/numpy/__init__.py", line 137, in <module>
    import add_newdocs
  File "/var/task/numpy/numpy/add_newdocs.py", line 9, in <module>
    from numpy.lib import add_newdoc
  File "/var/task/numpy/lib/__init__.py", line 13, in <module>
    from polynomial import *
  File "/var/task/numpy/lib/polynomial.py", line 11, in <module>
    import numpy.core.numeric as NX
AttributeError: 'module' object has no attribute 'core'

END RequestId: 36c62b93-b94f-11e5-9da7-83f24fc4b7ca
REPORT RequestId: 36c62b93-b94f-11e5-9da7-83f24fc4b7ca  Duration: 112.49 ms Billed Duration: 200 ms     Memory Size: 1536 MB    Max Memory Used: 14 MB
Run Code Online (Sandbox Code Playgroud)

我不明白为什么python没有找到文件夹结构中存在的核心目录.

编辑:

在@jarmod建议之后我将lambda函数简化为:

def make_movie(event, context):
    print('running make movie')
    import numpy
Run Code Online (Sandbox Code Playgroud)

我现在有以下错误:

START RequestId: 6abd7ef6-b9de-11e5-8aee-918ac0a06113 Version: $LATEST
running make movie
Error importing numpy: you should not try to import numpy from
        its source directory; please exit the numpy source tree, and relaunch
        your python intepreter from there.: ImportError
Traceback (most recent call last):
  File "/var/task/process_movie.py", line 3, in make_movie
    import numpy
  File "/var/task/numpy/__init__.py", line 127, in <module>
    raise ImportError(msg)
ImportError: Error importing numpy: you should not try to import numpy from
        its source directory; please exit the numpy source tree, and relaunch
        your python intepreter from there.

END RequestId: 6abd7ef6-b9de-11e5-8aee-918ac0a06113
REPORT RequestId: 6abd7ef6-b9de-11e5-8aee-918ac0a06113  Duration: 105.95 ms Billed Duration: 200 ms     Memory Size: 1536 MB    Max Memory Used: 14 MB
Run Code Online (Sandbox Code Playgroud)

Att*_*nyi 56

我也在关注你的第一个链接并设法以这种方式在Lambda函数中导入numpypandas(在Windows上):

  1. 使用64位Amazon Linux AMI 2015.09.1 启动(免费层)t2.micro EC2实例,并使用Putty进入SSH.
  2. 尝试使用您使用的相同命令和亚马逊文章推荐的命令:

    sudo yum -y update
    sudo yum -y upgrade
    sudo yum -y groupinstall "Development Tools"
    sudo yum -y install blas --enablerepo=epel
    sudo yum -y install lapack --enablerepo=epel
    sudo yum -y install Cython --enablerepo=epel
    sudo yum install python27-devel python27-pip gcc
    
    Run Code Online (Sandbox Code Playgroud)
  3. 创建虚拟环境:

    virtualenv ~/env
    source ~/env/bin/activate
    
    Run Code Online (Sandbox Code Playgroud)
  4. 安装:

    sudo ~/env/bin/pip2.7 install numpy
    sudo ~/env/bin/pip2.7 install pandas
    
    Run Code Online (Sandbox Code Playgroud)
  5. 然后,使用WinSCP,我登录并下载/home/ec2-user/env/lib/python2.7/dist-packages/home/ec2-user/env/lib64/python2.7/site-packages来自EC2实例的所有内容(除了_markerlib,pip*,pkg_resources,setuptools*和easyinstall*).

  6. 我把所有这些文件夹和文件放在一个zip中,以及包含Lambda函数的.py文件. 复制的所有文件的插图

  7. 因为这个.zip大于10 MB,所以我创建了一个S3存储桶来存储文件.我从那里复制了文件的链接并粘贴在Lambda函数的"从Amazon S3上传一个.ZIP".

  8. 可以关闭 EC2实例,不再需要它.

有了这个,我可以导入numpy和pandas.我不熟悉moviepy,但scipy可能已经很棘手,因为Lambda 对解压缩的部署包大小限制为262 144 000字节.我害怕numpy和scipy已经结束了.


rou*_*uk1 27

在这个帖子的所有帖子的帮助下,这里是记录的解决方案:

要实现这一点,您需要:

  1. 启动一个EC2至少有2GO RAM 的实例(能够编译NumPy&SciPy)

  2. 安装所需的依赖项

    sudo yum -y update
    sudo yum -y upgrade
    sudo yum -y groupinstall "Development Tools"
    sudo yum -y install blas --enablerepo=epel
    sudo yum -y install lapack --enablerepo=epel
    sudo yum -y install Cython --enablerepo=epel
    sudo yum install python27-devel python27-pip gcc
    virtualenv ~/env
    source ~/env/bin/activate
    pip install scipy
    pip install numpy
    pip install moviepy
    
    Run Code Online (Sandbox Code Playgroud)
  3. 将以下目录中的所有内容(除了_markerlib,pip*,pkg_resources,setuptools*和easyinstall*)复制到您的语言环境机器中stack:

    • home/ec2-user/env/lib/python2.7/dist-packages
    • home/ec2-user/env/lib64/python2.7/dist-packages
  4. 从您的EC2实例获取所有必需的共享库:

    • libatlas.so.3
    • libf77blas.so.3
    • liblapack.so.3
    • libptf77blas.so.3
    • libcblas.so.3
    • libgfortran.so.3
    • libptcblas.so.3
    • libquadmath.so.0
  5. 将它们放在lib文件夹的子stack文件夹中

  6. imageio是一个依赖moviepy,你需要下载其依赖的一些二进制版本:libfreeimageffmpeg; 他们可以在这里找到.将它们放在堆栈文件夹的根目录并重命名libfreeimage-3.16.0-linux64.solibfreeimage.so

  7. 您现在应该有一个stack文件夹包含:

    • root的所有python依赖项
    • lib子文件夹中的所有共享库
    • ffmpeg 根在二进制
    • libfreeimage.so 在根
  8. 压缩此文件夹: zip -r9 stack.zip . -x ".*" -x "*/.*"

  9. 使用以下lambda_function.py作为您的入口点lambda

    from __future__ import print_function
    
    import os
    import subprocess
    
    SCRIPT_DIR = os.path.dirname(os.path.abspath(__file__))
    LIB_DIR = os.path.join(SCRIPT_DIR, 'lib')
    FFMPEG_BINARY = os.path.join(SCRIPT_DIR, 'ffmpeg')
    
    
    def lambda_handler(event, context):
        command = 'LD_LIBRARY_PATH={} IMAGEIO_FFMPEG_EXE={} python movie_maker.py'.format(
            LIB_DIR,
            FFMPEG_BINARY,
        )
        try:
            output = subprocess.check_output(command, shell=True)
            print(output)
        except subprocess.CalledProcessError as e:
            print(e.output)
    
    Run Code Online (Sandbox Code Playgroud)
  10. 写一个movie_maker.py依赖于脚本moviepy,numpy...

  11. 将这些脚本添加到stack.zip文件中 zip -r9 lambda.zip *.py

  12. 上传zip S3并将其用作您的来源lambda

你也可以在stack.zip 这里下载.

  • @ rouk1这是怎么回事?site-packages目录的内容总大小约为250mb,高于lambda的限制 (3认同)

Vit*_*ata 7

这里的帖子帮助我找到一种方法来静态编译NumPy,其中包含可以包含在AWS Lambda Deployment包中的库文件.此解决方案不依赖于@ rouk1解决方案中的LD_LIBRARY_PATH值.

编译的NumPy库可以从https://github.com/vitolimandibhrata/aws-lambda-numpy下载

以下是自定义编译NumPy的步骤

从头开始编译此包的说明

使用AWS Linux准备新的AWS EC实例.

安装编译器依赖项

sudo yum -y install python-devel
sudo yum -y install gcc-c++
sudo yum -y install gcc-gfortran
sudo yum -y install libgfortran
Run Code Online (Sandbox Code Playgroud)

安装NumPy依赖项

sudo yum -y install blas
sudo yum -y install lapack
sudo yum -y install atlas-sse3-devel
Run Code Online (Sandbox Code Playgroud)

创建/ var/task/lib以包含运行时库

mkdir -p /var/task/lib
Run Code Online (Sandbox Code Playgroud)

/ var/task是您的代码将驻留在AWS Lambda中的根目录,因此我们需要在一个众所周知的文件夹中静态链接所需的库文件,在本例中为/ var/task/lib

将以下库文件复制到/ var/task/lib

cp /usr/lib64/atlas-sse3/liblapack.so.3 /var/task/lib/.
cp /usr/lib64/atlas-sse3/libptf77blas.so.3 /var/task/lib/.
cp /usr/lib64/atlas-sse3/libf77blas.so.3 /var/task/lib/.
cp /usr/lib64/atlas-sse3/libptcblas.so.3 /var/task/lib/.
cp /usr/lib64/atlas-sse3/libcblas.so.3 /var/task/lib/.
cp /usr/lib64/atlas-sse3/libatlas.so.3 /var/task/lib/.
cp /usr/lib64/atlas-sse3/libptf77blas.so.3 /var/task/lib/.
cp /usr/lib64/libgfortran.so.3 /var/task/lib/.
cp /usr/lib64/libquadmath.so.0 /var/task/lib/.
Run Code Online (Sandbox Code Playgroud)

http://sourceforge.net/projects/numpy/files/NumPy/获取最新的numpy源代码

转到numpy源代码文件夹,例如numpy-1.10.4使用以下条目创建site.cfg文件

[atlas]
libraries=lapack,f77blas,cblas,atlas
search_static_first=true
runtime_library_dirs = /var/task/lib
extra_link_args = -lgfortran -lquadmath
Run Code Online (Sandbox Code Playgroud)

-lgfortran -lquadmath标志是将gfortran和quadmath库与runtime_library_dirs中定义的文件静态链接所必需的

建立NumPy

python setup.py build
Run Code Online (Sandbox Code Playgroud)

安装NumPy

python setup.py install
Run Code Online (Sandbox Code Playgroud)

检查库是否链接到/ var/task/lib中的文件

ldd $PYTHON_HOME/lib64/python2.7/site-packages/numpy/linalg/lapack_lite.so
Run Code Online (Sandbox Code Playgroud)

你应该看到

linux-vdso.so.1 =>  (0x00007ffe0dd2d000)
liblapack.so.3 => /var/task/lib/liblapack.so.3 (0x00007ffad6be5000)
libptf77blas.so.3 => /var/task/lib/libptf77blas.so.3 (0x00007ffad69c7000)
libptcblas.so.3 => /var/task/lib/libptcblas.so.3 (0x00007ffad67a7000)
libatlas.so.3 => /var/task/lib/libatlas.so.3 (0x00007ffad6174000)
libf77blas.so.3 => /var/task/lib/libf77blas.so.3 (0x00007ffad5f56000)
libcblas.so.3 => /var/task/lib/libcblas.so.3 (0x00007ffad5d36000)
libpython2.7.so.1.0 => /usr/lib64/libpython2.7.so.1.0 (0x00007ffad596d000)
libgfortran.so.3 => /var/task/lib/libgfortran.so.3 (0x00007ffad5654000)
libm.so.6 => /lib64/libm.so.6 (0x00007ffad5352000)
libquadmath.so.0 => /var/task/lib/libquadmath.so.0 (0x00007ffad5117000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007ffad4f00000)
libc.so.6 => /lib64/libc.so.6 (0x00007ffad4b3e000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007ffad4922000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007ffad471d000)
libutil.so.1 => /lib64/libutil.so.1 (0x00007ffad451a000)
/lib64/ld-linux-x86-64.so.2 (0x000055cfc3ab8000)
Run Code Online (Sandbox Code Playgroud)


wrw*_*rwr 6

截至2017年,NumPy和SciPy拥有适用于Lambda的轮子(包装包括预编译libgfortranlibopenblas).据我所知,MoviePy是一个纯Python模块,所以基本上你可以这样做:

pip2 install -t lambda moviepy scipy
Run Code Online (Sandbox Code Playgroud)

然后将处理程序复制到lambda目录中并压缩它.除此之外,您最有可能超过50/250 MB的大小限制.有几件事可以帮助:

  • 删除.pycs,docs,tests和其他不必要的部分;
  • 留下NumPy和SciPy公共图书馆的单一副本;
  • 剥离不必要的部分库,例如调试符号;
  • 使用更高的设置压缩存档.

这是一个自动执行上述要点的示例脚本.


joh*_*cip 5

另一种非常简单的方法是使用LambCI用于模仿Lambda的令人敬畏的docker容器来构建:https://github.com/lambci/docker-lambda

lambci/lambda:build容器类似于AWS Lambda,增加了大部分完整的构建环境.要在其中启动shell会话:

docker run -v "$PWD":/var/task -it lambci/lambda:build bash
Run Code Online (Sandbox Code Playgroud)

会议内部:

export share=/var/task
easy_install pip
pip install -t $share numpy
Run Code Online (Sandbox Code Playgroud)

或者,使用virtualenv:

export share=/var/task
export PS1="[\u@\h:\w]\$ " # required by virtualenv
easy_install pip
pip install virtualenv
# ... make the venv, install numpy, and copy it to $share
Run Code Online (Sandbox Code Playgroud)

稍后您可以使用主lambci/lambda容器来测试您的构建.