不能从当地的pyspark对ec2 spark集群做简单的任务

Ant*_*ony 8 amazon-ec2 amazon-web-services apache-spark

我试图从我的mac执行pyspark在EC2 spark集群上进行计算.
如果我登录到群集,它按预期工作:

$ ec2/spark-ec2 -i ~/.ec2/spark.pem -k spark login test-cluster2
$ spark/bin/pyspark
Run Code Online (Sandbox Code Playgroud)

然后做一个简单的任务

>>> data=sc.parallelize(range(1000),10)`
>>> data.count()
Run Code Online (Sandbox Code Playgroud)

按预期工作:

14/06/26 16:38:52 INFO spark.SparkContext: Starting job: count at <stdin>:1
14/06/26 16:38:52 INFO scheduler.DAGScheduler: Got job 0 (count at <stdin>:1) with 10 output partitions (allowLocal=false)
14/06/26 16:38:52 INFO scheduler.DAGScheduler: Final stage: Stage 0 (count at <stdin>:1)
...
14/06/26 16:38:53 INFO spark.SparkContext: Job finished: count at <stdin>:1, took 1.195232619 s
1000
Run Code Online (Sandbox Code Playgroud)

但是现在如果我从本地机器尝试同样的东西,

$ MASTER=spark://ec2-54-234-204-13.compute-1.amazonaws.com:7077 bin/pyspark
Run Code Online (Sandbox Code Playgroud)

它似乎无法连接到群集

14/06/26 09:45:43 INFO AppClient$ClientActor: Connecting to master spark://ec2-54-234-204-13.compute-1.amazonaws.com:7077...
14/06/26 09:45:47 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory
...
  File "/Users/anthony1/git/incubator-spark/python/lib/py4j-0.8.1-src.zip/py4j/protocol.py", line 300, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o20.collect.
: org.apache.spark.SparkException: Job aborted: Spark cluster looks down
14/06/26 09:53:17 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory
Run Code Online (Sandbox Code Playgroud)

我认为问题出在ec2安全性中,但即使在向主安全组和从属安全组添加入站规则以接受所有端口之后它也无济于事.

任何帮助将不胜感激!

其他人在邮件列表上提出同样的问题 http://apache-spark-user-list.1001560.n3.nabble.com/Deploying-a-python-code-on-a-spark-EC2-cluster-td4758.html# a8465

小智 8

spark-ec2脚本将EC2中的Spark Cluster配置为独立,这意味着它无法与远程提交一起使用.我一直在努力解决你所描述的同样错误几天,然后才发现它不受支持.遗憾的是,错误消息错误.

所以你必须复制你的东西并登录主人来执行你的火花任务.