相关疑难解决方法(0)

如何远程运行Apache Spark shell？

我有一个Spark群集设置,一个主人和三个工人.我也在CentOS VM上安装了Spark.我正在尝试从我的本地VM运行一个Spark shell,它将连接到master,并允许我执行简单的Scala代码.所以,这是我在本地VM上运行的命令:

bin/spark-shell --master spark://spark01:7077

Run Code Online (Sandbox Code Playgroud)

shell运行到我可以输入Scala代码的位置.它说执行者已被授予(x3 - 每个工人一个).如果我查看Master的UI,我可以看到一个正在运行的应用程序,Spark shell.所有工作者都是ALIVE,使用了2/2个核心,并为应用程序分配了512 MB(5 GB中).所以,我尝试执行以下Scala代码:

sc.parallelize(1 to 100).count

Run Code Online (Sandbox Code Playgroud)

不幸的是,该命令不起作用.shell将无休止地打印相同的警告:

INFO SparkContext: Starting job: count at <console>:13
INFO DAGScheduler: Got job 0 (count at <console>:13) with 2 output partitions (allowLocal=false)
INFO DAGScheduler: Final stage: Stage 0(count at <console>:13) with 2 output partitions (allowLocal=false)
INFO DAGScheduler: Parents of final stage: List()
INFO DAGScheduler: Missing parents: List()
INFO DAGScheduler: Submitting Stage 0 (Parallel CollectionRDD[0] at parallelize at <console>:13), which has no missing parents
INFO …

Run Code Online (Sandbox Code Playgroud)

apache-spark

Nic*_*las

2016 11-04

20
推荐指数

1
解决办法

2万
查看次数

Spark:检查您的集群UI以确保已注册工作人员

我在Spark中有一个简单的程序:

/* SimpleApp.scala */
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf

object SimpleApp {
  def main(args: Array[String]) {
    val conf = new SparkConf().setMaster("spark://10.250.7.117:7077").setAppName("Simple Application").set("spark.cores.max","2")
    val sc = new SparkContext(conf)    
    val ratingsFile = sc.textFile("hdfs://hostname:8020/user/hdfs/mydata/movieLens/ds_small/ratings.csv")

    //first get the first 10 records 
    println("Getting the first 10 records: ")
    ratingsFile.take(10)    

    //get the number of records in the movie ratings file
    println("The number of records in the movie list are : ")
    ratingsFile.count() 
  }
}

Run Code Online (Sandbox Code Playgroud)

当我尝试从spark-shell运行此程序时,即我登录到名称节点(Cloudera安装)并在spark-shell上顺序运行命令:

val ratingsFile = sc.textFile("hdfs://hostname:8020/user/hdfs/mydata/movieLens/ds_small/ratings.csv")
println("Getting the first 10 records: ") …

Run Code Online (Sandbox Code Playgroud)

hadoop scala cloudera cloudera-manager apache-spark

vin*_*nha

2018 11-02

15
推荐指数

1
解决办法

3万
查看次数

提交Spark作业 - 等待(TaskSchedulerImpl:不接受初始作业)

用于提交作业的API调用.响应状态 - 正在运行

在群集UI上 -

工人(奴隶) - 工人-20160712083825-172.31.17.189-59433是活着的

核心1用于2

内存1Gb中的6个使用过

运行应用程序

app-20160713130056-0020 - 等待5小时以来

核心 - 无限制

申请职位描述

活跃舞台

在/root/wordcount.py:23中的reduceByKey

待定阶段

takeOrdered at /root/wordcount.py:26

跑步司机 -

stderr log page for driver-20160713130051-0025 

WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

Run Code Online (Sandbox Code Playgroud)

根据初始职位没有接受任何资源; 检查您的集群UI以确保工作人员已注册并且具有足够的资源 Slaves尚未启动 - 因此它没有资源.

但在我的情况下 - 奴隶1正在工作

根据Unable to Execute不仅仅是一个火花作业"初始作业还没有接受任何资源" 我使用的是deploy-mode = cluster(不是客户端)因为我有1个master 1 slave,并且通过Postman/Anywhere调用Submit API

群集还有可用的核心,RAM,内存 - 静止作业会抛出UI传达的错误

根据TaskSchedulerImpl:初始工作没有接受任何资源; 我分配了

~/spark-1.5.0/conf/spark-env.sh …

Run Code Online (Sandbox Code Playgroud)

api amazon-ec2 apache-spark

Cha*_*pat

2017 05-23

7
推荐指数

1
解决办法

3221
查看次数

请求执行程序,因为任务被积压

我有一个火花流应用程序,直到昨天一直运行得很好,突然遇到这个警告.我有相同的环境并使用相同的代码.以下是警告:

05/09 17:13:03 INFO ExecutorAllocationManager:请求16个新的执行程序,因为任务被积压(新的期望总数将是31)16/05/09 17:13:03 INFO ExecutorAllocationManager:请求19个新的执行程序因为任务被积压(new期望的总数将是50)

16/05/09 17:13:12警告YarnScheduler:最初的工作没有接受任何资源; 检查群集UI以确保工作人员已注册并具有足够的资源

16/05/09 17:13:27 WARN YarnScheduler:最初的工作没有接受任何资源; 检查群集UI以确保工作人员已注册并具有足够的资源

我在cloudera 5.5上使用apache spark 1.6.快速入门VM.群集上没有运行任何应用程序来使用可用资源.

是否有任何配置.

谢谢!

apache-spark pyspark cloudera-quickstart-vm

Abh*_*bhi

lucky-day

6
推荐指数

1
解决办法

2018
查看次数

Spark错误:初始作业未接受任何资源; 检查群集UI以确保工作人员已注册并具有足够的资源

我有一个虚拟机,其中安装了独立模式的spark-2.0.0-bin-hadoop2.7.

我跑去./sbin/start-all.sh跑主人和奴隶.

当我./bin/spark-shell --master spark://192.168.43.27:7077 --driver-memory 600m --executor-memory 600m --executor-cores 1在机器本身执行任务的状态时RUNNING,我能够在spark shell中计算代码.

当我从网络中的另一台机器执行完全相同的命令时,状态再次为"RUNNING",但是spark-shell抛出WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources.我想问题与资源没有直接关系,因为相同的命令在虚拟机本身中起作用,但在来自其他机器时却不起作用.