如何在 EMR 的 Amazon CLI 中使用“-files”指定多个文件？

Question

如何在 EMR 的 Amazon CLI 中使用“-files”指定多个文件？

Lot*_*tte 1 hadoop amazon-web-services amazon-emr aws-cli

我正在尝试通过 amazon CLI 启动一个 amazon 集群，但我有点困惑我应该如何指定多个文件。我目前的电话如下：

aws emr create-cluster --steps Type=STREAMING,Name='Intra country development',ActionOnFailure=CONTINUE,Args=[-files,s3://betaestimationtest/mapper.py,-
files,s3://betaestimationtest/reducer.py,-mapper,mapper.py,-reducer,reducer.py,-
input,s3://betaestimationtest/output_0_inter,-output,s3://betaestimationtest/output_1_intra] 
--ami-version 3.1.0 
--instance-groupsInstanceGroupType=MASTER,InstanceCount=1,InstanceType=m3.xlarge 
InstanceGroupType=CORE,InstanceCount=2,InstanceType=m3.xlarge --auto-terminate 
--log-uri s3://betaestimationtest/logs

Run Code Online (Sandbox Code Playgroud)

但是，Hadoop 现在抱怨它找不到减速器文件：

Caused by: java.io.IOException: Cannot run program "reducer.py": error=2, No such file or directory

Run Code Online (Sandbox Code Playgroud)

我究竟做错了什么？该文件确实存在于我指定的文件夹中

Answer 1

小智 5

要在流式处理步骤中传递多个文件，您需要使用 file:// 将这些步骤作为 json 文件传递。

AWS CLI 速记语法使用逗号作为分隔符来分隔参数列表。因此，当我们尝试传入诸如“-files”、“s3://betaestimationtest/mapper.py,s3://betaestimationtest/reducer.py”之类的参数时，速记语法解析器将处理mapper.py 和reducer。 py 文件作为两个参数。

解决方法是使用 json 格式。请参阅以下示例。

aws emr create-cluster --steps file://./mysteps.json --ami-version 3.1.0 --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m3.xlarge InstanceGroupType=CORE,InstanceCount=2,InstanceType=m3.xlarge --auto-terminate --log-uri s3://betaestimationtest/logs

Run Code Online (Sandbox Code Playgroud)

mysteps.json 看起来像：

[
    {
    "Name": "Intra country development",
    "Type": "STREAMING",
    "ActionOnFailure": "CONTINUE",
    "Args": [
        "-files",
        "s3://betaestimationtest/mapper.py,s3://betaestimationtest/reducer.py",
        "-mapper",
        "mapper.py",
        "-reducer",
        "reducer.py",
        "-input",
        " s3://betaestimationtest/output_0_inte",
        "-output",
        " s3://betaestimationtest/output_1_intra"
    ]}
]

Run Code Online (Sandbox Code Playgroud)

您还可以在此处找到示例：https : //github.com/aws/aws-cli/blob/develop/awscli/examples/emr/create-cluster-examples.rst。参见示例 13。

希望能帮助到你！

归档时间：	11 年，5 月前
查看次数：	2271 次
最近记录：	10 年，6 月前