Scala/Hadoop:为Reducer指定上下文

ram*_*ion 7 hadoop scala mapreduce

在我开始玩Scoobi或Scrunch之前,我想我会尝试使用Hadoop(0.20.1)的java绑定将WordCount移植到scala(2.9.1).

最初,我有:

class Map extends Mapper[LongWritable, Text, Text, IntWritable] {
  @throws[classOf[IOException]]
  @throws[classOf[InterruptedException]]
  def map(key : LongWritable, value : Text, context : Context) {
    //...
Run Code Online (Sandbox Code Playgroud)

编译得很好,但给了我一个运行时错误:

java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
Run Code Online (Sandbox Code Playgroud)

看了一下之后,我发现这是因为我没有定义正确的map方法(应该由缺乏提示override),所以我把它修复为:

override def map(key : LongWritable, value : Text, 
  context : Mapper[LongWritable, Text, Text, IntWritable]#Context) {
Run Code Online (Sandbox Code Playgroud)

瞧,没有运行时错误.

但后来我查看了作业输出,并意识到我的减速机没有运行.

所以我查看了我的reducer,并注意到reduce签名与我的mapper有同样的问题:

class Reduce extends Reducer[Text, IntWritable, Text, IntWritable] {
  @throws[classOf[IOException]]
  @throws[classOf[InterruptedException]]
  def reduce(key : Text, value : Iterable[IntWritable], context : Context) {
    //...
Run Code Online (Sandbox Code Playgroud)

所以我猜测reduce由于不匹配而使用了身份.

但当我试图纠正以下的签名reduce:

override def reduce(key: Text, values : Iterable[IntWritable], 
  context : Reducer[Text, IntWritable, Text, IntWritable]#Context) {
Run Code Online (Sandbox Code Playgroud)

我现在遇到编译器错误:

[ERROR] /path/to/src/main/scala/WordCount.scala:32: error: method reduce overrides nothing
[INFO]     override def reduce(key: Text, values : Iterable[IntWritable], 
Run Code Online (Sandbox Code Playgroud)

所以我不确定我做错了什么.

jwi*_*der 11

乍一看,确保值是java.lang.Iterable,而不是scala Iterable.导入java.lang.Iterable,或者:

override def reduce(key: Text, values : java.lang.Iterable[IntWritable], context : Reducer[Text, IntWritable, Text, IntWritable]#Context)
Run Code Online (Sandbox Code Playgroud)

  • 这是完全正确的.请看这里的例子:https://bitbucket.org/jasonbaldridge/fogbow/src/6c24fb2afda4/src/main/scala/fogbow/example/WordCount.scala (2认同)