Fel*_*ipe 1 scala playframework apache-spark
我有一个使用Scala 2.11.8和Spark"spark-core"%"2.2.0"和"spark-sql"%"2.2.0"的Play网络应用程序.我正在尝试阅读包含电影评级的文件并对其进行一些转换.当我使用该函数分割tabs(movieLines.map(x => (x.split("\t")(1).toInt, 1)))时,我得到一个错误,我猜这是因为guava lib依赖.我想这是因为我在谷歌上做的所有搜索都显示了一些基于此的修复.但我无法弄清楚如何排除一些番石榴依赖性.
这是我的代码:
def popularMovies() = Action { implicit request: Request[AnyContent] =>
Util.downloadSourceFile("downloads/ml-100k.zip", "http://files.grouplens.org/datasets/movielens/ml-100k.zip")
Util.unzip("downloads/ml-100k.zip")
val sparkContext = SparkCommons.sparkSession.sparkContext
println("got sparkContext")
val movieLines = sparkContext.textFile("downloads/ml-100k/u.data")
println("popularMovies")
println(movieLines)
// Map to (movieID , 1) tuples
val movieTuples = movieLines.map(x => (x.split("\t")(1).toInt, 1))
println("movieTuples")
println(movieTuples)
// Count up all the 1's for each movie
val movieCounts = movieTuples.reduceByKey((x, y) => x + y)
println("movieCounts")
println(movieCounts)
// Flip (movieId, count) to (count, movieId)
val movieCountFlipped = movieCounts.map(x => (x._2, x._1))
println(movieCountFlipped)
// Sort
val sortedMovies = movieCountFlipped.sortByKey()
println(sortedMovies)
// collect and print the result
val results = sortedMovies.collect().toList.mkString(",\n")
println(results)
Ok("[" + results + "]")
}
Run Code Online (Sandbox Code Playgroud)
和错误:
[error] application -
! @76oh9h40m - Internal server error, for (GET) [/api/popularMovies] ->
play.api.http.HttpErrorHandlerExceptions$$anon$1: Execution exception[[RuntimeException: java.lang.IllegalAccessError: tried to access method com.google.common.base.Stopwatch.<init>()V from class org.apache.hadoop.mapred.FileInputFormat]]
at play.api.http.HttpErrorHandlerExceptions$.throwableToUsefulException(HttpErrorHandler.scala:255)
at play.api.http.DefaultHttpErrorHandler.onServerError(HttpErrorHandler.scala:180)
at play.core.server.AkkaHttpServer$$anonfun$3.applyOrElse(AkkaHttpServer.scala:311)
at play.core.server.AkkaHttpServer$$anonfun$3.applyOrElse(AkkaHttpServer.scala:309)
at scala.concurrent.Future$$anonfun$recoverWith$1.apply(Future.scala:346)
at scala.concurrent.Future$$anonfun$recoverWith$1.apply(Future.scala:345)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:91)
at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
Caused by: java.lang.RuntimeException: java.lang.IllegalAccessError: tried to access method com.google.common.base.Stopwatch.<init>()V from class org.apache.hadoop.mapred.FileInputFormat
at play.api.mvc.ActionBuilder$$anon$2.apply(Action.scala:424)
at play.api.mvc.Action$$anonfun$apply$2.apply(Action.scala:96)
at play.api.mvc.Action$$anonfun$apply$2.apply(Action.scala:89)
at play.api.libs.streams.StrictAccumulator$$anonfun$mapFuture$2$$anonfun$1.apply(Accumulator.scala:174)
at play.api.libs.streams.StrictAccumulator$$anonfun$mapFuture$2$$anonfun$1.apply(Accumulator.scala:174)
at scala.util.Try$.apply(Try.scala:192)
at play.api.libs.streams.StrictAccumulator$$anonfun$mapFuture$2.apply(Accumulator.scala:174)
at play.api.libs.streams.StrictAccumulator$$anonfun$mapFuture$2.apply(Accumulator.scala:170)
at scala.Function1$$anonfun$andThen$1.apply(Function1.scala:52)
at scala.Function1$$anonfun$andThen$1.apply(Function1.scala:52)
Caused by: java.lang.IllegalAccessError: tried to access method com.google.common.base.Stopwatch.<init>()V from class org.apache.hadoop.mapred.FileInputFormat
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:312)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:194)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121)
Run Code Online (Sandbox Code Playgroud)
我添加了这个依赖项,它解决了我的问题.
libraryDependencies += "org.apache.hadoop" % "hadoop-client" % "2.7.2"
| 归档时间: |
|
| 查看次数: |
518 次 |
| 最近记录: |