use*_*021 0 java hadoop mapreduce
我正在尝试做一个电影推荐系统并且一直在关注这个网站。链接这里
def count_ratings_users_freq(self, user_id, values):
"""
For each user, emit a row containing their "postings"
(item,rating pairs)
Also emit user rating sum and count for use later steps.
output:
userid, number of movie rated by user, rating number count, (movieid, movie rating)
17 1,3,(70,3)
35 1,1,(21,1)
49 3,7,(19,2 21,1 70,4)
87 2,3,(19,1 21,2)
98 1,2,(19,2)
"""
item_count = 0
item_sum = 0
final = []
for item_id, rating in values:
item_count += 1
item_sum += rating
final.append((item_id, rating))
yield user_id, (item_count, item_sum, final)
Run Code Online (Sandbox Code Playgroud)
是否可以使用 Hadoop Map 和 Reduce 将上述代码转换为 Java?
userid作为关键
no. movie rated by user, rating number count, (movieid, movie ratings)值。谢谢!
是的,您可以将其转换为 map reduce 程序。
映射器逻辑:
减速器逻辑:
对于每个值,您需要解析该值并获得“电影评分”。例如对于值 (70,3),您将解析电影评分 = 3。
对于每个有效记录,您将增加 movieCount。您将解析的“电影评级”添加到“movieRatingCount”并将值附加到“movieValues”字符串。
您将获得所需的输出。
以下是一段代码,它实现了这一点:
package com.myorg.hadooptests;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import java.io.IOException;
public class MovieRatings {
public static class MovieRatingsMapper
extends Mapper<LongWritable, Text , IntWritable, Text>{
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String valueStr = value.toString();
int index = valueStr.indexOf(',');
if(index != -1) {
try
{
IntWritable keyUserID = new IntWritable(Integer.parseInt(valueStr.substring(0, index)));
context.write(keyUserID, new Text(valueStr.substring(index + 1)));
}
catch(Exception e)
{
// You could get a NumberFormatException
}
}
}
}
public static class MovieRatingsReducer
extends Reducer<IntWritable, Text, IntWritable, Text> {
public void reduce(IntWritable key, Iterable<Text> values,
Context context) throws IOException, InterruptedException {
int movieCount = 0;
int movieRatingCount = 0;
String movieValues = "";
for (Text value : values) {
String[] tokens = value.toString().split(",");
if(tokens.length == 2)
{
movieRatingCount += Integer.parseInt(tokens[1].trim()); // You could get a NumberFormatException
movieCount++;
movieValues = movieValues.concat(value.toString() + " ");
}
}
context.write(key, new Text(Integer.toString(movieCount) + "," + Integer.toString(movieRatingCount) + ",(" + movieValues.trim() + ")"));
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "CompositeKeyExample");
job.setJarByClass(MovieRatings.class);
job.setMapperClass(MovieRatingsMapper.class);
job.setReducerClass(MovieRatingsReducer.class);
job.setOutputKeyClass(IntWritable.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job, new Path("/in/in2.txt"));
FileOutputFormat.setOutputPath(job, new Path("/out/"));
System.exit(job.waitForCompletion(true) ? 0:1);
}
}
Run Code Online (Sandbox Code Playgroud)
对于输入:
17,70,3
35,21,1
49,19,2
49,21,1
49,70,4
87,19,1
87,21,2
98,19,2
Run Code Online (Sandbox Code Playgroud)
我得到了输出:
17 1,3,(70,3)
35 1,1,(21,1)
49 3,7,(70,4 21,1 19,2)
87 2,3,(21,2 19,1)
98 1,2,(19,2)
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
5531 次 |
| 最近记录: |