如果你能给我一些启发,我很感激.
我有问题在Amazon EMR中运行字数统计地图减少作为Spark步骤.但我设法ssh到主节点并在spark-shell中运行字数统计逻辑而没有问题.
它抱怨主HDFS上不存在__spark_conf_xx.zip,尽管复制时没有错误
16/04/05 07:20:21 INFO yarn.Client: Uploading resource file:/mnt/tmp/spark-1d701ab0-7990-4ca2-bee2-099aed8e8e6b/__spark_conf__9006968814682693730.zip -> hdfs://ip-172-31-26-247.ap-northeast-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1459839685827_0001/__spark_conf__9006968814682693730.zip
Run Code Online (Sandbox Code Playgroud)
日志如下:
16/04/05 07:20:16 INFO client.RMProxy: Connecting to ResourceManager at ip-172-31-26-247.ap-northeast-1.compute.internal/172.31.26.247:8032
16/04/05 07:20:16 INFO yarn.Client: Requesting a new application from cluster with 2 NodeManagers
16/04/05 07:20:16 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (11520 MB per container)
16/04/05 07:20:16 INFO yarn.Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
16/04/05 07:20:16 INFO yarn.Client: Setting …Run Code Online (Sandbox Code Playgroud)