Hadoop - 输出键/值分隔符

Jus*_*irl 2 java hadoop separator

我想将输出分隔符更改为; 而不是制表符.我已经尝试过: Hadoop:键和值在输出文件中以制表符分隔.怎么做以分号分隔? 但我的输出仍然是

key (tab) value
Run Code Online (Sandbox Code Playgroud)

我正在使用Cloudera Demo(CDH 4.1.3).这是我的代码:

Configuration conf = new Configuration();
        String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
        if (otherArgs.length != 2) {
            System.err.println("Usage: Driver <in> <out>");
            System.exit(2);
        }
        conf.set("mapreduce.textoutputformat.separator", ";");

        Path in = new Path(otherArgs[0]);
        Path out = new Path(otherArgs[1]);

        Job job= new Job(getConf());
        job.setJobName("MapReduce");

        job.setMapperClass(Mapper.class);
        job.setReducerClass(Reducer.class);

        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);

        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class);

        FileInputFormat.setInputPaths(job, in);
        FileOutputFormat.setOutputPath(job, out);

        job.setJarByClass(Driver.class);
        job.waitForCompletion(true) ? 0 : 1;
Run Code Online (Sandbox Code Playgroud)

我想要

key;value
Run Code Online (Sandbox Code Playgroud)

作为我的输出.

Tho*_*lut 7

该物业被称为mapreduce.output.textoutputformat.separator.所以你基本上错过了output那里.

您可以在Apache SVN中找到的最新主干源代码中看到它.


Unm*_*eni 5

你应该 conf.set("mapreduce.textoutputformat.separator", ";");

使用conf.set("mapreduce.textoutputformat.separator", ";"); 已过时

mapredmapreduce

关联

完整代码:这是有效的。

    Configuration conf = new Configuration();
    String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
    if (otherArgs.length != 2) {
        System.err.println("Usage: Driver <in> <out>");
        System.exit(2);
    }
    conf.set("mapred.textoutputformat.separator", ";");

    Path in = new Path(otherArgs[0]);
    Path out = new Path(otherArgs[1]);

    Job job= new Job(getConf());
    job.setJobName("MapReduce");

    job.setMapperClass(Mapper.class);
    job.setReducerClass(Reducer.class);

    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(Text.class);

    job.setInputFormatClass(TextInputFormat.class);
    job.setOutputFormatClass(TextOutputFormat.class);

    FileInputFormat.setInputPaths(job, in);
    FileOutputFormat.setOutputPath(job, out);

    job.setJarByClass(Driver.class);
    job.waitForCompletion(true) ? 0 : 1;
Run Code Online (Sandbox Code Playgroud)


Nik*_*sik 5

在 2019 年,这是 getConf().set(TextOutputFormat.SEPARATOR, ";");(感谢 @AsheshKumarSingh)

我相信使用原生常量提供了更好的可维护性和更少的惊喜。

重要提示:此属性必须 Job.getInstance(getConf())/之前设置new Job(getConf()),因为作业会复制参数并且不关心进一步的 conf 修改。