我正在尝试使用Amazon EMR 分析维基百科文章视图数据集.此数据集包含三个月期间(2011年1月1日至2011年3月31日)的页面查看统计信息.我试图找到那段时间内观点最多的文章.这是我正在使用的代码:
public class mostViews {
public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable views = new IntWritable(1);
private Text article = new Text();
public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
String line = value.toString();
String[] words = line.split(" ");
article.set(words[1]);
views.set(Integer.parseInt(words[2]));
output.collect(article, views);
}
}
public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, …Run Code Online (Sandbox Code Playgroud)