如何在Amazon EMR中安装sqoop?

har*_*ari 1 hive amazon-emr sqoop amazon-redshift

我在Amazon EMR中创建了一个集群并使用-emr-4.0.0.Hadoop发行版:Amazon 2.6.0和Hive 1.0.0.需要安装Sqoop才能在Hive和Redshift之间进行通信?在EMR集群中安装Sqoop的步骤是什么?要求提供步骤.谢谢!

Ana*_*dor 7

请注意,在EMR 4.0.0中hadoop fs -copyToLocal会抛出错误.

aws s3 cp改用.

比Amal更具体:

  1. 下载最新版本的SQOOP并将其上传到S3位置.我正在使用sqoop-1.4.4.bin__hadoop-2.0.4-alpha,它似乎与EMR 4.0.0一起工作得很好
  2. 下载Redshift的JAR连接器并将其上载到相同的S3位置.此页面可能有所帮助.
  3. 将类似于下面的脚本上传到S3

    #!/bin/bash
    # Install sqoop and mysql connector. Store in s3 and load
    # as bootstrap step.
    
    bucket_location='s3://your-sqoop-jars-location/'
    sqoop_jar='sqoop-1.4.4.bin__hadoop-2.0.4-alpha'
    sqoop_jar_gz=$sqoop_jar.tar.gz
    redshift_jar='RedshiftJDBC41-1.1.7.1007.jar'
    
    cd /home/hadoop
    
    aws s3 cp $bucket_location$sqoop_jar_gz .
    tar -xzf $sqoop_jar_gz
    aws s3 cp $bucket_location$redshift_jar .
    cp $redshift_jar $sqoop_jar/lib/
    
    Run Code Online (Sandbox Code Playgroud)
  4. 设置SQOOP_HOME并将SQOOP_HOME添加到PATH以便能够从任何地方调用sqoop.这些条目应该在/ etc/bashrc中创建.否则你将不得不使用完整路径,在这种情况下:/home/hadoop/sqoop-1.4.4.bin__hadoop-2.0.4-alpha/bin/sqoop

我正在使用Java以编程方式启动我的EMR集群.要在Java中配置引导步骤,我创建一个BootstrapActionConfigFactory:

public final class BootstrapActionConfigFactory {
    private static final String bucket = Config.getBootstrapBucket();

    // make class non-instantiable
    private BootstrapActionConfigFactory() {
    }

    /**
     * Adds an install Sqoop step to the job that corresponds to the version set in the Config class.
     */
    public static BootstrapActionConfig newInstallSqoopBootstrapActionConfig() {
        return newInstallSqoopBootstrapActionConfig(Config.getHadoopVersion().charAt(0));
    }

    /**
     * Adds an install Sqoop step to the job that corresponds to the version specified in the parameter
     *
     * @param hadoopVersion the main version number for Hadoop. E.g.: 1, 2
     */
    public static BootstrapActionConfig newInstallSqoopBootstrapActionConfig(char hadoopVersion) {
        return new BootstrapActionConfig().withName("Install Sqoop")
            .withScriptBootstrapAction(
                new ScriptBootstrapActionConfig().withPath("s3://" + bucket + "/sqoop-tools/hadoop" + hadoopVersion + "/bootstrap-sqoop-emr4.sh"));
    }
}
Run Code Online (Sandbox Code Playgroud)

然后在创建作业时:

Job job = new Job(Region.getRegion(Regions.US_EAST_1));
 job.addBootstrapAction(BootstrapActionConfigFactory.newInstallSqoopBootstrapActionConfig());
Run Code Online (Sandbox Code Playgroud)