mic*_*dam 6 amazon-web-services scrapy aws-lambda
如果这是完全错误使用Lambda的情况,请告诉我.
我想将Scrapy安装到Lambda函数中并调用该函数来开始爬网.我的第一个问题是如何安装它,以便所有路径都正确.我使用要压缩的目录作为其根目录安装程序,因此zip包含所有源文件和可执行文件.我的工作基于这篇文章.在它所说的包含在我的函数开头的行中,"process"变量来自哪里?我试过了,
var process = require('child_process');
var exec = process.exec;
process.env['PATH'] = process.env['PATH'] + ':' +
process.env['LAMBDA_TASK_ROOT']
Run Code Online (Sandbox Code Playgroud)
但是我得到了错误,
"errorMessage": "Cannot read property 'PATH' of undefined",
"errorType": "TypeError",
Run Code Online (Sandbox Code Playgroud)
我是否需要包含所有库文件,或者只包含/ usr/lib中的可执行文件?如何在文章中说明我需要的那一行代码?
编辑:我尝试将代码移动到child_process.exec,并收到错误
"errorMessage": "Command failed: /bin/sh: process.env[PATH]: command not found\n/bin/sh: scrapy: command not found\n"
Run Code Online (Sandbox Code Playgroud)
这是我目前的整个功能
console.log("STARTING");
var process = require('child_process');
var exec = process.exec;
exports.handler = function(event, context) {
//Run a fixed Python command.
exec("process.env['PATH'] = process.env['PATH'] + ':' + process.env['LAMBDA_TASK_ROOT']; scrapy crawl backpage2", function(error, stdout) {
console.log('Scrapy returned: ' + stdout + '.');
context.done(error, stdout);
});
};
Run Code Online (Sandbox Code Playgroud)
您绝对可以从Lambda中的Node代码执行任意进程.例如,我们这样做是为了运行一个处理玩家转弯的游戏服务器.我不确定你要用Scrapy完成什么,但请记住,你的整个Lambda调用现在只能在AWS上运行最多60秒!但是,如果你没关系,这里有一个完整的例子,说明我们如何从Lambda执行我们自己的任意linux进程.(在我们的例子中,它是一个已编译的二进制文件 - 只要你有一些可以在他们使用的Linux映像上运行的东西,这无关紧要).
var child_process = require('child_process');
var path = require('path');
exports.handler = function (event, context) {
// If timeout is provided in context, get it. Otherwise, assume 60 seconds
var timeout = (context.getRemainingTimeInMillis && (context.getRemainingTimeInMillis() - 1000)) || 60000;
// The task root is the directory with the code package.
var taskRoot = process.env['LAMBDA_TASK_ROOT'] || __dirname;
// The command to execute.
var command;
// Set up environment variables
process.env.HOME = '/tmp'; // <-- for naive processes that assume $HOME always works! You might not need this.
// On linux the executable is in task root / __dirname, whichever was defined
process.env.PATH += ':' + taskRoot;
command = 'bash -c "cp -R /var/task/YOUR_THING /tmp/; cd /tmp; ./YOUR_THING ARG1 ARG2 ETC"'
child_process.exec(command, {
timeout: timeout,
env: process.env
}, function (error, stdout, stderr) {
console.log(stdout);
console.log(stderr);
context.done(null, {exit: true, stdout: stdout, stderr: stderr});
});
};
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
4532 次 |
| 最近记录: |