如何使用hadoop mapreduce编程计算文件中特定单词的出现次数？

Question

如何使用hadoop mapreduce编程计算文件中特定单词的出现次数？

我正在尝试使用 java 中的 hadoop mapreduce 编程来计算文件中特定单词的出现次数。文件和单词都应该是用户输入。所以我试图将特定单词作为第三个参数与 i/p 和 o/p 路径(In, Out, Word)一起传递。但是我无法找到将单词传递给 map 函数的方法。我尝试了以下方法，但没有奏效： - 在映射器类中创建了一个静态字符串变量，并将我的第三个参数（即要搜索的单词）的值分配给它。然后尝试在 map 函数中使用这个静态变量。但是在 map 函数内部，静态变量的值是 Null。我无法在 map 函数中获得第三个 arument 的值。

无论如何要通过 JobConf 对象设置值？请帮忙。我在下面粘贴了我的代码。

public class MyWordCount {

    public static class MyWordCountMap extends Mapper < Text, Text, Text, LongWritable > {
        static String wordToSearch;
        private final static LongWritable ONE = new LongWritable(1L);
        private Text word = new Text();
        public void map(Text key, Text value, Context context)
        throws IOException, InterruptedException {
            System.out.println(wordToSearch); // Here the value is coming as Null
            if (value.toString().compareTo(wordToSearch) == 0) {
                context.write(word, ONE);
            }
        }
    }


    public static class SumReduce extends Reducer < Text, LongWritable, Text, LongWritable > {

        public void reduce(Text key, Iterator < LongWritable > values,
            Context context) throws IOException, InterruptedException {
            long sum = 0L;
            while (values.hasNext()) {
                sum += values.next().get();
            }
            context.write(key, new LongWritable(sum));
        }
    }

    public static void main(String[] rawArgs) throws Exception {

        GenericOptionsParser parser = new GenericOptionsParser(rawArgs);
        Configuration conf = parser.getConfiguration();
        String[] args = parser.getRemainingArgs();
        Job job = new Job(conf, "wordcount");
        job.setJarByClass(MyWordCountMap.class);
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(LongWritable.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(LongWritable.class);
        job.setMapperClass(MyWordCountMap.class);
        job.setReducerClass(SumReduce.class);
        job.setInputFormatClass(SequenceFileInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class);
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        String MyWord = args[2];
        MyWordCountMap.wordToSearch = MyWord;
        job.waitForCompletion(true);
    }

}

Run Code Online (Sandbox Code Playgroud)

Answer 1

Sam*_*Sam 5

有一种方法可以做到这一点Configuration（请参阅此处的api ）。例如，可以使用以下代码将“Tree”设置为要搜索的单词：

//Create a new configuration
Configuration conf = new Configuration();
//Set the work to be searched
conf.set("wordToSearch", "Tree");
//create the job
Job job = new Job(conf);

Run Code Online (Sandbox Code Playgroud)

然后，在您的 mapper/reducer 类中，您可以wordToSearch使用以下内容获取（即本例中的“树”）：

//Create a new configuration
Configuration conf = context.getConfiguration();
//retrieve the wordToSearch variable
String wordToSearch = conf.get("wordToSearch");

Run Code Online (Sandbox Code Playgroud)

请参阅此处了解更多详情。

归档时间：	12 年，6 月前
查看次数：	7116 次
最近记录：	12 年，6 月前