pra*_*eep 13 java hadoop mapreduce
在新的API(apache.hadoop.mapreduce.KeyValueTextInputFormat)中,如何指定除tab之外的分隔符(分隔符)(默认值)以分隔键和值.
样本输入:
one,first line
two,second line
Run Code Online (Sandbox Code Playgroud)
需要输出:
Key : one
Value : first line
Key : two
Value : second line
Run Code Online (Sandbox Code Playgroud)
我将KeyValueTextInputFormat指定为:
Job job = new Job(conf, "Sample");
job.setInputFormatClass(KeyValueTextInputFormat.class);
KeyValueTextInputFormat.addInputPath(job, new Path("/home/input.txt"));
Run Code Online (Sandbox Code Playgroud)
这适用于tab作为分隔符.
fiz*_*max 11
在较新的API中,您应该使用mapreduce.input.keyvaluelinerecordreader.key.value.separator配置属性.
这是一个例子:
Configuration conf = new Configuration();
conf.set("mapreduce.input.keyvaluelinerecordreader.key.value.separator", ",");
Job job = new Job(conf);
job.setInputFormatClass(KeyValueTextInputFormat.class);
// next job set-up
Run Code Online (Sandbox Code Playgroud)
请在驱动程序代码中进行以下设置.
conf.set("key.value.separator.in.input.line", ",");
Run Code Online (Sandbox Code Playgroud)