Hadoop Mapreduce多个输入文件

Question

Hadoop Mapreduce多个输入文件

gau*_*ssd 5 java command-line hadoop mapreduce command-line-arguments

所以我需要两个文件作为我的mapreduce程序的输入:City.dat和Country.dat

在我的main方法中,我解析命令行参数,如下所示:

Path cityInputPath = new Path(args[0]);
Path countryInputPath = new Path(args[1]);
Path outputPath = new Path(args[2]);
MultipleInputs.addInputPath(job, countryInputPath, TextInputFormat.class, JoinCountryMapper.class);
MultipleInputs.addInputPath(job, cityInputPath, TextInputFormat.class, JoinCityMapper.class);
FileOutputFormat.setOutputPath(job, outputPath);

Run Code Online (Sandbox Code Playgroud)

如果我现在使用以下命令运行我的程序:

hadoop jar capital.jar org.myorg.Capital /user/cloudera/capital/input/City.dat /user/cloudera/capital/input/Country.dat /user/cloudera/capital/output

Run Code Online (Sandbox Code Playgroud)

我收到以下错误:

Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory /user/cloudera/capital/input/Country.dat already exists

Run Code Online (Sandbox Code Playgroud)

为什么将它视为我的输出目录？我指定了另一个目录作为输出目录.有人可以解释一下吗？

Answer 1

Tho*_*lut 6

基于堆栈跟踪,您的输出目录不为空.所以最简单的事情就是在运行作业之前删除它:

bin/hadoop fs -rmr /user/cloudera/capital/output

Run Code Online (Sandbox Code Playgroud)

除此之外,您的参数以您的主类的类名开头org.myorg.Capital.这就是零指数的论证.(基于堆栈跟踪和您提供的代码).

基本上你需要将所有索引一个向右移动:

Path cityInputPath = new Path(args[1]);
Path countryInputPath = new Path(args[2]);
Path outputPath = new Path(args[3]);
MultipleInputs.addInputPath(job, countryInputPath, TextInputFormat.class, JoinCountryMapper.class);
MultipleInputs.addInputPath(job, cityInputPath, TextInputFormat.class, JoinCityMapper.class);
FileOutputFormat.setOutputPath(job, outputPath);

Run Code Online (Sandbox Code Playgroud)

不要忘记清除输出文件夹!

对你来说也是一个小技巧,你可以用逗号","分隔文件,这样你就可以用这样的单个调用来设置它们:

hadoop jar capital.jar org.myorg.Capital /user/cloudera/capital/input/City.dat,/user/cloudera/capital/input/Country.dat

Run Code Online (Sandbox Code Playgroud)

在你的java代码中:

FileInputFormat.addInputPaths(job, args[1]);

Run Code Online (Sandbox Code Playgroud)

归档时间：	13 年，2 月前
查看次数：	20465 次
最近记录：	12 年，1 月前