prk*_*rk2 4 python hadoop amazon-ec2 apache-spark
请告诉我如何解决问题.
首先,我确认当master是"local"时代码运行.
然后我启动了两个EC2实例(m1.large).但是,当master为"spark:// MASTER_PUBLIC_DNS:7077"时,将显示错误消息"TaskSchedulerImpl",但它将失败.
当我从VALID地址更改为INVALID地址作为主设备(spark:// INVALID_DNS:7077)时,会出现相同的错误消息.
即,"WARN TaskSchedulerImpl:初始作业尚未接受任何资源;检查您的集群UI以确保工作人员已注册且具有足够的内存"
看起来像这样.作为这个评论,我为这个集群分配了12G内存,但它失败了.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from pyspark import SparkContext, SparkConf
from pyspark.mllib.classification import LogisticRegressionWithSGD
from pyspark.mllib.regression import LabeledPoint
from numpy import array
# Load and parse the data
def parsePoint(line):
values = [float(x) for x in line.split(' ')]
return LabeledPoint(values[0], values[1:])
appName = "testsparkapp"
master = "spark://MASTER_PUBLIC_DNS:7077"
#master = "local"
conf = SparkConf().setAppName(appName).setMaster(master)
sc = SparkContext(conf=conf)
data = sc.textFile("/root/spark/mllib/data/sample_svm_data.txt")
parsedData = data.map(parsePoint)
# Build the model
model = LogisticRegressionWithSGD.train(parsedData)
# Evaluating the model on training data
labelsAndPreds = parsedData.map(lambda p: (p.label, model.predict(p.features)))
trainErr = labelsAndPreds.filter(lambda (v, p): v != p).count() / float(parsedData.count())
print("Training Error = " + str(trainErr))
Run Code Online (Sandbox Code Playgroud)
我完成了三个任务,我的朋友告诉我.
我开了主港,7077.
2.在主URL中,设置主机名而不是ip地址.
- >因此,我开始能够连接主服务器(我通过Cluster UI检查).
3.我试图设置worker_max_heap,如下所示,但它可能失败了.
ScalaConf().set("spark.executor.memory","4g").set("worker_max_heapsize","2g")
工作人员允许我使用6.3GB(我通过UI检查).它是m1.large.
- >我在执行日志中识别出警告,并在工作者stderr中发生错误.
14/08/08 06:11:59 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory
Run Code Online (Sandbox Code Playgroud)
14/08/08 06:14:04 INFO worker.WorkerWatcher: Successfully connected to akka.tcp://sparkWorker@PRIVATE_HOST_NAME1:52011/user/Worker
14/08/08 06:15:07 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@PRIVATE_HOST_NAME1:52201] -> [akka.tcp://spark@PRIVATE_HOST_NAME2:38286] disassociated! Shutting down.
Run Code Online (Sandbox Code Playgroud)
小智 5
spark-ec2脚本将EC2中的Spark Cluster配置为独立,这意味着它无法与远程提交一起使用.我一直在努力解决你所描述的同样错误几天,然后才发现它不受支持.遗憾的是,错误消息错误.
所以你必须复制你的东西并登录主人来执行你的火花任务.
| 归档时间: |
|
| 查看次数: |
2872 次 |
| 最近记录: |