use*_*014 6 scala intellij-idea playframework apache-spark
我的环境概述:Mac OS Yosemite,Play framework 2.3.7,sbt 0.13.7,Intellij Idea 14,java 1.8.0_25
我试图在Play框架中运行一个简单的Spark程序,所以我只是在Intellij中创建一个Play 2项目,并按如下方式更改一些文件:
应用程序/控制器/ Application.scala:
package controllers
import play.api._
import play.api.libs.iteratee.Enumerator
import play.api.mvc._
object Application extends Controller {
def index = Action {
Ok(views.html.index("Your new application is ready."))
}
def trySpark = Action {
Ok.chunked(Enumerator(utils.TrySpark.runSpark))
}
}
Run Code Online (Sandbox Code Playgroud)
应用程序/ utils的/ TrySpark.scala:
package utils
import org.apache.spark.{SparkContext, SparkConf}
object TrySpark {
def runSpark: String = {
val conf = new SparkConf().setAppName("trySpark").setMaster("local[4]")
val sc = new SparkContext(conf)
val data = sc.textFile("public/data/array.txt")
val array = data.map ( line => line.split(' ').map(_.toDouble) )
val sum = array.first().reduce( (a, b) => a + b )
return sum.toString
}
}
Run Code Online (Sandbox Code Playgroud)
公共/数据/ array.txt:
1 2 3 4 5 6 7
Run Code Online (Sandbox Code Playgroud)
CONF /路线:
GET / controllers.Application.index
GET /spark controllers.Application.trySpark
GET /assets/*file controllers.Assets.at(path="/public", file)
Run Code Online (Sandbox Code Playgroud)
build.sbt:
name := "trySpark"
version := "1.0"
lazy val `tryspark` = (project in file(".")).enablePlugins(PlayScala)
scalaVersion := "2.10.4"
libraryDependencies ++= Seq( jdbc , anorm , cache , ws,
"org.apache.spark" % "spark-core_2.10" % "1.2.0")
unmanagedResourceDirectories in Test <+= baseDirectory ( _ /"target/web/public/test" )
Run Code Online (Sandbox Code Playgroud)
我键入activator run
以在开发模式下运行此应用程序然后键入localhost:9000/spark
浏览器,它28
按预期显示结果.但是,当我想要类型activator start
在生产模式下运行此应用程序时,它显示以下错误消息:
[info] play - Application started (Prod)
[info] play - Listening for HTTP on /0:0:0:0:0:0:0:0:9000
[error] application -
! @6kik15fee - Internal server error, for (GET) [/spark] ->
play.api.Application$$anon$1: Execution exception[[InvalidInputException: Input path does not exist: file:/Path/to/my/project/target/universal/stage/public/data/array.txt]]
at play.api.Application$class.handleError(Application.scala:296) ~[com.typesafe.play.play_2.10-2.3.7.jar:2.3.7]
at play.api.DefaultApplication.handleError(Application.scala:402) [com.typesafe.play.play_2.10-2.3.7.jar:2.3.7]
at play.core.server.netty.PlayDefaultUpstreamHandler$$anonfun$14$$anonfun$apply$1.applyOrElse(PlayDefaultUpstreamHandler.scala:205) [com.typesafe.play.play_2.10-2.3.7.jar:2.3.7]
at play.core.server.netty.PlayDefaultUpstreamHandler$$anonfun$14$$anonfun$apply$1.applyOrElse(PlayDefaultUpstreamHandler.scala:202) [com.typesafe.play.play_2.10-2.3.7.jar:2.3.7]
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33) [org.scala-lang.scala-library-2.10.4.jar:na]
Caused by: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/Path/to/my/project/target/universal/stage/public/data/array.txt
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:251) ~[org.apache.hadoop.hadoop-mapreduce-client-core-2.2.0.jar:na]
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:270) ~[org.apache.hadoop.hadoop-mapreduce-client-core-2.2.0.jar:na]
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:201) ~[org.apache.spark.spark-core_2.10-1.2.0.jar:1.2.0]
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) ~[org.apache.spark.spark-core_2.10-1.2.0.jar:1.2.0]
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) ~[org.apache.spark.spark-core_2.10-1.2.0.jar:1.2.0]
Run Code Online (Sandbox Code Playgroud)
似乎我的array.txt
文件没有在生产模式下加载.怎么能解决这个问题?
Sal*_*lem 10
这里的问题是public
当你在生产中运行时,你的根项目目录中的目录将不可用.它被打包成一个罐子(通常在里面STAGE_DIR/lib/PROJ_NAME-VERSION-assets.jar
),所以你将无法以这种方式访问它们.
我可以在这看到两个解决方案:
1)将文件放在conf
目录中.这将工作,但似乎非常脏,特别是如果您打算使用更多的数据文件;
2)将这些文件放在某个目录中,并告诉sbt将其打包.您可以继续使用该public
目录,尽管使用不同的目录似乎更好,特别是如果您想要更多的文件.
假设array.txt
放置在datafiles
项目根目录中的目录中,您可以将其添加到build.sbt
:
mappings in Universal ++=
(baseDirectory.value / "datafiles" * "*" get) map
(x => x -> ("datafiles/" + x.getName))
Run Code Online (Sandbox Code Playgroud)
不要忘记更改应用代码中的路径:
// (...)
val data = sc.textFile("datafiles/array.txt")
Run Code Online (Sandbox Code Playgroud)
然后干净,当你运行时start
,stage
或者dist
那些文件将可用.
归档时间: |
|
查看次数: |
1646 次 |
最近记录: |