我试图将我的输出从reducer分离到不同的文件夹..
My dirver has the following code:
FileOutputFormat.setOutputPath(job, new Path(output));
//MultipleOutputs.addNamedOutput(job, namedOutput, outputFormatClass, keyClass, valueClass)
//MultipleOutputs.addNamedOutput(job, namedOutput, outputFormatClass, keyClass, valueClass)
MultipleOutputs.addNamedOutput(job, "foo", TextOutputFormat.class, NullWritable.class, Text.class);
MultipleOutputs.addNamedOutput(job, "bar", TextOutputFormat.class, Text.class,NullWritable.class);
MultipleOutputs.addNamedOutput(job, "foobar", TextOutputFormat.class, Text.class, NullWritable.class);
And then my reducer has the following code:
mos.write("foo",NullWritable.get(),new Text(jsn.toString()));
mos.write("bar", key,NullWritable.get());
mos.write("foobar", key,NullWritable.get());
But in the output, I see:
output/foo-r-0001
output/foo-r-0002
output/foobar-r-0001
output/bar-r-0001
But what I am trying is :
output/foo/part-r-0001
output/foo/part-r-0002
output/bar/part-r-0001
Run Code Online (Sandbox Code Playgroud)
输出/ foobar的/部分-R-0001
我该怎么做呢?谢谢
如果您的意思是这个MultipleOutputs,最简单的方法是从您的减速器中执行以下操作之一 -
就您而言,这是第 1 点,因此,请更改以下内容 --
mos.write("foo",NullWritable.get(),new Text(jsn.toString()));
mos.write("bar", key,NullWritable.get());
mos.write("foobar", key,NullWritable.get());
Run Code Online (Sandbox Code Playgroud)
到,
mos.write("foo",NullWritable.get(),new Text(jsn.toString()), "foo/part");
mos.write("bar", key,NullWritable.get(), "bar/part");
mos.write("foobar", key,NullWritable.get(), "foobar/part");
Run Code Online (Sandbox Code Playgroud)
其中,"foo/part"、"bar/part"和"foobar/part"对应于 baseOutputPath。因此,目录 foo、bar 和 foobar 将被创建并位于part-r-xxxxx文件内。
您也可以尝试上面的第 2 点,它实际上不需要任何命名输出。
如果需要,请回复我以获取进一步的说明。
| 归档时间: |
|
| 查看次数: |
1298 次 |
| 最近记录: |