use*_*469 6 ssh amazon-ec2 apache-spark
为了运行Amplab的训练练习,我创建了一个密钥对us-east-1,安装了训练脚本(git clone git://github.com/amplab/training-scripts.git -b ampcamp4)并创建了env.变量AWS_ACCESS_KEY_ID和AWS_SECRET_ACCESS_KEY遵循http://ampcamp.berkeley.edu/big-data-mini-course/launching-a-bdas-cluster-on-ec2.html中的说明
现在正在运行
./spark-ec2 -i ~/.ssh/myspark.pem -r us-east-1 -k myspark --copy launch try1
Run Code Online (Sandbox Code Playgroud)
生成以下消息:
johndoe@ip-some-instance:~/projects/spark/training-scripts$ ./spark-ec2 -i ~/.ssh/myspark.pem -r us-east-1 -k myspark --copy launch try1
Setting up security groups...
Searching for existing cluster try1...
Latest Spark AMI: ami-19474270
Launching instances...
Launched 5 slaves in us-east-1b, regid = r-0c5e5ee3
Launched master in us-east-1b, regid = r-316060de
Waiting for instances to start up...
Waiting 120 more seconds...
Copying SSH key /home/johndoe/.ssh/myspark.pem to master...
ssh: connect to host ec2-54-90-57-174.compute-1.amazonaws.com port 22: Connection refused
Error connecting to host Command 'ssh -t -o StrictHostKeyChecking=no -i /home/johndoe/.ssh/myspark.pem root@ec2-54-90-57-174.compute-1.amazonaws.com 'mkdir -p ~/.ssh'' returned non-zero exit status 255, sleeping 30
ssh: connect to host ec2-54-90-57-174.compute-1.amazonaws.com port 22: Connection refused
Error connecting to host Command 'ssh -t -o StrictHostKeyChecking=no -i /home/johndoe/.ssh/myspark.pem root@ec2-54-90-57-174.compute-1.amazonaws.com 'mkdir -p ~/.ssh'' returned non-zero exit status 255, sleeping 30
...
...
subprocess.CalledProcessError: Command 'ssh -t -o StrictHostKeyChecking=no -i /home/johndoe/.ssh/myspark.pem root@ec2-54-90-57-174.compute-1.amazonaws.com '/root/spark/bin/stop-all.sh'' returned non-zero exit status 127
Run Code Online (Sandbox Code Playgroud)
root@ec2-54-90-57-174.compute-1.amazonaws.com用户和主实例在哪里.我已经尝试-u ec2-user并-w一直增加到600,但得到相同的错误.
us-east-1当我登录AWS控制台时,我可以看到主实例和从属实例,我实际上可以从"本地" ip-some-instanceshell ssh进入Master实例.
我的理解是spark-ec2脚本负责定义主/从安全组(监听哪些端口等),我不应该调整这些设置.这Port:22, Protocol:tcp, Source:0.0.0.0/0就是说,主人和奴隶都听22号(在ampcamp3-slaves/masters sec.组中).
我在这里不知所措,在我将所有研发资金用于EC2实例之前,我会感激任何指示....谢谢.
这很可能是因为SSH需要很长时间才能启动实例,导致120秒超时在计算机登录之前到期.你应该能够跑步
./spark-ec2 -i ~/.ssh/myspark.pem -r us-east-1 -k myspark --copy launch --resume try1
Run Code Online (Sandbox Code Playgroud)
(带有--resume标志)从没有重新启动新实例的地方继续.这个问题将在Spark 1.2.0中修复,我们有一种新的机制可以智能地检查SSH状态,而不是依赖于固定的超时.我们还通过构建新的AMI来解决长SSH启动延迟背后的根本原因.
| 归档时间: |
|
| 查看次数: |
1760 次 |
| 最近记录: |