SPARK_EXECUTOR_INSTANCES在SPARK SHELL,纱线客户端模式下不起作用

Nee*_*gal 2 hadoop scala hadoop-yarn apache-spark

我是新来的。

试图运行spark on yarn in yarn-client mode

SPARK VERSION = 1.0.2 HADOOP VERSION = 2.2.0

纱线簇具有3个活动节点。

在spark-env.sh中设置的属性

SPARK_EXECUTOR_MEMORY = 1G

SPARK_EXECUTOR_INSTANCES = 3

SPARK_EXECUTOR_CORES = 1

SPARK_DRIVER_MEMORY = 2G

使用的命令:/ bin / spark-shell --master yarn-client

但是登录后spark-shell,它只注册了1个执行程序,并为其分配了一些默认的内存。

我也通过spark-web UI它确认了它只有1个执行器,并且也YARN resource manager node仅在主节点()上。

INFO yarn.Client:用于启动Spark Application的命令主:列表($ JAVA_HOME / bin / java,-server,-Xmx2048m,-Djava.io.tmpdir = $ PWD / tmp,-Dspark.tachyonStore.folderName = \“ spark- fc6383cc-0904-4af9-8abd-3b66b3f0f461 \“,-Dspark.yarn.secondary.jars = \” \“,-Dspark.home = \” / home / impadmin / spark-1.0.2-bin-hadoop2 \“, -Dspark.repl.class.uri = \“ http://master_node:46823\”,-Dspark.driver.host = \“ master_node \”,-Dspark.app.name = \“ Spark shell \”,-Dspark.jars = \“ \” ,-Dspark.fileserver.uri = \“ http://master_node:46267\”,-Dspark.master = \“ yarn-client \”,-Dspark.driver.port = \“ 41209 \”,-Dspark.httpBroadcast.uri = \“ http://master_node:36965\” ,-Dlog4j.configuration = log4j-spark-container.properties,org.apache.spark.deploy.yarn.ExecutorLauncher,--class,notused,--jar,null,--args'master_node:41209',--executor-memory,1024,-executor-cores,1,-num-executors,3、1 >,/ stdout,2>,/ stderr)

...

...

...

14/09/10 22:21:24 INFO cluster.YarnClientSchedulerBackend: Registered executor:
Run Code Online (Sandbox Code Playgroud)

ID为1 14/09/10 22:21:24信息存储的Actor [akka.tcp:// sparkExecutor @ master_node:53619 / user / Executor#1075999905] INFO storage.BlockManagerInfo:注册块管理器master_node:40205,具有589.2 MB RAM 14 / 09/10 22:21:25 INFO cluster.YarnClientClusterScheduler:YarnClientClusterScheduler.postStartHook完成14/09/10 22:21:25 INFO repl.SparkILoop:创建了spark上下文。

并且在以任何并行度运行任何spark动作之后,它仅在该节点上串联运行所有这些任务!

小智 5

好吧,我这样解决了。我的集群上有4个数据节点

spark-shell --num-executors 4 --master yarn-client