当我将fileinputFormat设置为hadoop输入时.该arg[0]+"/*/*/*"说比赛没有文件.
我想要的是从多个文件中读取:
Directory1
---Directory11
---Directory111
--f1.txt
--f2.txt
---Directory12
Directory2
---Directory21
在Hadoop中有可能吗?谢谢!
我试图将我的输出从reducer分离到不同的文件夹..
My dirver has the following code:
FileOutputFormat.setOutputPath(job, new Path(output));
//MultipleOutputs.addNamedOutput(job, namedOutput, outputFormatClass, keyClass, valueClass)
//MultipleOutputs.addNamedOutput(job, namedOutput, outputFormatClass, keyClass, valueClass)
MultipleOutputs.addNamedOutput(job, "foo", TextOutputFormat.class, NullWritable.class, Text.class);
MultipleOutputs.addNamedOutput(job, "bar", TextOutputFormat.class, Text.class,NullWritable.class);
MultipleOutputs.addNamedOutput(job, "foobar", TextOutputFormat.class, Text.class, NullWritable.class);
And then my reducer has the following code:
mos.write("foo",NullWritable.get(),new Text(jsn.toString()));
mos.write("bar", key,NullWritable.get());
mos.write("foobar", key,NullWritable.get());
But in the output, I see:
output/foo-r-0001
output/foo-r-0002
output/foobar-r-0001
output/bar-r-0001
But what I am trying is :
output/foo/part-r-0001
output/foo/part-r-0002
output/bar/part-r-0001
Run Code Online (Sandbox Code Playgroud)
输出/ foobar的/部分-R-0001
我该怎么做呢?谢谢
我的MapReduce工作按日期处理数据,需要将输出写入某个文件夹结构.目前的期望是产生以下结构:
2013
01
02
..
2012
01
02
..
Run Code Online (Sandbox Code Playgroud)
等等
在任何时候,我只获得长达12个月的数据,因此,我使用MultipleOutputs类在驱动程序中使用以下函数创建12个输出:
public void createOutputs(){
Calendar c = Calendar.getInstance();
String monthStr, pathStr;
// Create multiple outputs for last 12 months
// TODO make 12 configurable
for(int i = 0; i < 12; ++i ){
//Get month and add 1 as month is 0 based index
int month = c.get(Calendar.MONTH)+1;
//Add leading 0
monthStr = month > 10 ? "" + month : "0" + month ;
// …Run Code Online (Sandbox Code Playgroud)