Nee*_*gal 2 hadoop scala hadoop-yarn apache-spark
我是新来的。
试图运行spark on yarn in yarn-client mode。
SPARK VERSION = 1.0.2 HADOOP VERSION = 2.2.0
纱线簇具有3个活动节点。
在spark-env.sh中设置的属性
SPARK_EXECUTOR_MEMORY = 1G
SPARK_EXECUTOR_INSTANCES = 3
SPARK_EXECUTOR_CORES = 1
SPARK_DRIVER_MEMORY = 2G
使用的命令:/ bin / spark-shell --master yarn-client
但是登录后spark-shell,它只注册了1个执行程序,并为其分配了一些默认的内存。
我也通过spark-web UI它确认了它只有1个执行器,并且也YARN resource manager node仅在主节点()上。
INFO yarn.Client:用于启动Spark Application的命令主:列表($ JAVA_HOME / bin / java,-server,-Xmx2048m,-Djava.io.tmpdir = $ PWD / tmp,-Dspark.tachyonStore.folderName = \“ spark- fc6383cc-0904-4af9-8abd-3b66b3f0f461 \“,-Dspark.yarn.secondary.jars = \” \“,-Dspark.home = \” / home / impadmin / spark-1.0.2-bin-hadoop2 \“, -Dspark.repl.class.uri = \“
http://master_node:46823\”,-Dspark.driver.host = \“ master_node \”,-Dspark.app.name = \“ Spark shell \”,-Dspark.jars = \“ \” ,-Dspark.fileserver.uri = \“http://master_node:46267\”,-Dspark.master = \“ yarn-client \”,-Dspark.driver.port = \“ 41209 \”,-Dspark.httpBroadcast.uri = \“http://master_node:36965\” ,-Dlog4j.configuration = log4j-spark-container.properties,org.apache.spark.deploy.yarn.ExecutorLauncher,--class,notused,--jar,null,--args'master_node:41209',--executor-memory,1024,-executor-cores,1,-num-executors,3、1 >,/ stdout,2>,/ stderr)Run Code Online (Sandbox Code Playgroud)... ... ... 14/09/10 22:21:24 INFO cluster.YarnClientSchedulerBackend: Registered executor:ID为1 14/09/10 22:21:24信息存储的Actor [akka.tcp:// sparkExecutor @ master_node:53619 / user / Executor#1075999905] INFO storage.BlockManagerInfo:注册块管理器master_node:40205,具有589.2 MB RAM 14 / 09/10 22:21:25 INFO cluster.YarnClientClusterScheduler:YarnClientClusterScheduler.postStartHook完成14/09/10 22:21:25 INFO repl.SparkILoop:创建了spark上下文。
并且在以任何并行度运行任何spark动作之后,它仅在该节点上串联运行所有这些任务!
| 归档时间: |
|
| 查看次数: |
2163 次 |
| 最近记录: |