如何在Hadoop-.20 api中指定KeyValueTextInputFormat分隔符?

pra*_*eep 13 java hadoop mapreduce

在新的API(apache.hadoop.mapreduce.KeyValueTextInputFormat)中,如何指定除tab之外的分隔符(分隔符)(默认值)以分隔键和值.

样本输入:

one,first line
two,second line
Run Code Online (Sandbox Code Playgroud)

需要输出:

Key : one
Value : first line
Key : two
Value : second line
Run Code Online (Sandbox Code Playgroud)

我将KeyValueTextInputFormat指定为:

    Job job = new Job(conf, "Sample");

    job.setInputFormatClass(KeyValueTextInputFormat.class);
    KeyValueTextInputFormat.addInputPath(job, new Path("/home/input.txt"));
Run Code Online (Sandbox Code Playgroud)

这适用于tab作为分隔符.

fiz*_*max 11

在较新的API中,您应该使用mapreduce.input.keyvaluelinerecordreader.key.value.separator配置属性.

这是一个例子:

Configuration conf = new Configuration();
conf.set("mapreduce.input.keyvaluelinerecordreader.key.value.separator", ",");

Job job = new Job(conf);
job.setInputFormatClass(KeyValueTextInputFormat.class);
// next job set-up
Run Code Online (Sandbox Code Playgroud)


Jou*_*Man 5

请在驱动程序代码中进行以下设置.

conf.set("key.value.separator.in.input.line", ",");
Run Code Online (Sandbox Code Playgroud)