使用bootstrap替换EMR上的默认jar

Eda*_*ame 1 bootstrapping hadoop amazon-web-services emr

我在AMI 3.0.4的EMR集群上.群集启动后,我ssh掌握并手动执行以下操作:

cd /home/hadoop/share/hadoop/common/lib/
rm guava-11.0.2.jar
wget http://central.maven.org/maven2/com/google/guava/guava/14.0.1/guava-14.0.1.jar
chmod 777 guava-14.0.1.jar
Run Code Online (Sandbox Code Playgroud)

是否可以在引导操作中执行上述操作?谢谢!

Pet*_*rch 6

使用EMR 4.0,hadoop安装路径发生了变化.因此,必须将guava-14.0.1.jar的手动更新更改为:

cd /usr/lib/hadoop/lib
sudo wget http://central.maven.org/maven2/com/google/guava/guava/14.0.1/guava-14.0.1.jar
sudo rm guava-11.0.2.jar
Run Code Online (Sandbox Code Playgroud)

Sandesh的答案中的boostrap动作对我们不起作用.

编辑:

现在我们得到了EMR 4.0的解决方案.您必须在S3中提供spark-config.json,它为Spark Executor和Driver设置额外的ClassPath.在"编辑软件设置(可选)"部分中,您可以定义此配置文件的位置并从S3加载它.

火花config.json

[
  {
  "classification":"spark",
  "properties":{
    "maximizeResourceAllocation":"true"
    }
  },
  {
  "classification":"spark-defaults",
  "properties":{
    "spark.executor.extraClassPath":"/home/hadoop/lib/guava-14.0.1.jar",
    "spark.driver.extraClassPath":"/home/hadoop/lib/guava-14.0.1.jar",
    }
  }
]
Run Code Online (Sandbox Code Playgroud)

需要通过boostrap脚本下载guava-14.0.1.jar: guava_download.sh

#!/bin/bash
mkdir -p /home/hadoop/lib/
cd /home/hadoop/lib/
wget https://repo1.maven.org/maven2/com/google/guava/guava/14.0.1/guava-14.0.1.jar
Run Code Online (Sandbox Code Playgroud)