Spark Combinebykey JAVA lambda表达式

Was*_*sim 5 java lambda apache-spark

我想使用lambda函数来计算(JavaPairRDD<Integer, Double> pairs)的键的平均值.出于这个原因,我开发了以下代码:

java.util.function.Function<Double, Tuple2<Double, Integer>> createAcc = x -> new Tuple2<Double, Integer>(x, 1);

BiFunction<Tuple2<Double, Integer>, Double, Tuple2<Double, Integer>>  addAndCount = (Tuple2<Double, Integer> x, Double y) -> {  return new Tuple2(x._1()+y, x._2()+1 );   };

BiFunction<Tuple2<Double, Integer>, Tuple2<Double, Integer>, Tuple2<Double, Integer>>  combine = (Tuple2<Double, Integer> x, Tuple2<Double, Integer> y) -> {  return new Tuple2(x._1()+y._1(), x._2()+y._2() );   };

JavaPairRDD<Integer, Tuple2<Double, Integer>> avgCounts = pairs.combineByKey(createAcc, addAndCount, combine);
Run Code Online (Sandbox Code Playgroud)

但是,eclipse显示了这个错误:

The method combineByKey(Function<Double,C>, Function2<C,Double,C>, Function2<C,C,C>) in the type JavaPairRDD<Integer,Double> is not applicable for the arguments (Function<Double,Tuple2<Double,Integer>>,
 BiFunction<Tuple2<Double,Integer>,Double,Tuple2<Double,Integer>>, BiFunction<Tuple2<Double,Integer>,Tuple2<Double,Integer>,Tuple2<Double,Integer>>) 
Run Code Online (Sandbox Code Playgroud)

G Q*_*ana 5

combineByKey方法需要org.apache.spark.api.java.function.Function2而不是java.util.function.BiFunction.所以要么你写:

java.util.function.Function<Double, Tuple2<Double, Integer>> createAcc =
    x -> new Tuple2<Double, Integer>(x, 1);

Function2<Tuple2<Double, Integer>, Double, Tuple2<Double, Integer>>  addAndCount = 
    (Tuple2<Double, Integer> x, Double y) -> {  return new Tuple2(x._1()+y, x._2()+1 );   };

Function2<Tuple2<Double, Integer>, Tuple2<Double, Integer>, Tuple2<Double, Integer>>  combine = 
    (Tuple2<Double, Integer> x, Tuple2<Double, Integer> y) -> {  return new Tuple2(x._1()+y._1(), x._2()+y._2() );   };

JavaPairRDD<Integer, Tuple2<Double, Integer>> avgCounts = 
    pairs.combineByKey(createAcc, addAndCount, combine);
Run Code Online (Sandbox Code Playgroud)