Joj*_*dan 5 scala apache-spark
你好,我有一份老师给我的文件。这是关于 Scala 和 Spark 的。当我运行代码时,它给了我这个异常:
(run-main-0) scala.ScalaReflectionException: class java.sql.Date in
JavaMirror with ClasspathFilter
Run Code Online (Sandbox Code Playgroud)
文件本身如下所示:
import org.apache.spark.ml.feature.Tokenizer
import org.apache.spark.sql.Dataset
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.types._
object Main {
type Embedding = (String, List[Double])
type ParsedReview = (Integer, String, Double)
org.apache.log4j.Logger getLogger "org" setLevel
(org.apache.log4j.Level.WARN)
org.apache.log4j.Logger getLogger "akka" setLevel
(org.apache.log4j.Level.WARN)
val spark = SparkSession.builder
.appName ("Sentiment")
.master ("local[9]")
.getOrCreate
import spark.implicits._
val reviewSchema = StructType(Array(
StructField ("reviewText", StringType, nullable=false),
StructField ("overall", DoubleType, nullable=false),
StructField ("summary", StringType, nullable=false)))
// Read file and merge the text abd summary into a single text column
def loadReviews (path: String): Dataset[ParsedReview] =
spark
.read
.schema (reviewSchema)
.json (path)
.rdd
.zipWithUniqueId
.map[(Integer,String,Double)] { case (row,id) => (id.toInt, s"${row getString 2} ${row getString 0}", row getDouble 1) }
.toDS
.withColumnRenamed ("_1", "id" )
.withColumnRenamed ("_2", "text")
.withColumnRenamed ("_3", "overall")
.as[ParsedReview]
// Load the GLoVe embeddings file
def loadGlove (path: String): Dataset[Embedding] =
spark
.read
.text (path)
.map { _ getString 0 split " " }
.map (r => (r.head, r.tail.toList.map (_.toDouble))) // yuck!
.withColumnRenamed ("_1", "word" )
.withColumnRenamed ("_2", "vec")
.as[Embedding]
def main(args: Array[String]) = {
val glove = loadGlove ("Data/glove.6B.50d.txt") // take glove
val reviews = loadReviews ("Data/Electronics_5.json") // FIXME
// replace the following with the project code
glove.show
reviews.show
spark.stop
}
}
Run Code Online (Sandbox Code Playgroud)
我需要保留 import org.apache.spark.sql.Dataset 行,因为某些代码依赖于它,但正是因为它,我抛出了异常。
我的 build.sbt 文件如下所示:
name := "Sentiment Analysis Project"
version := "1.1"
scalaVersion := "2.11.12"
scalacOptions ++= Seq("-unchecked", "-deprecation")
initialCommands in console :=
"""
import Main._
"""
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.3.0"
libraryDependencies += "org.apache.spark" %% "spark-mllib" %
"2.3.0"
libraryDependencies += "org.scalactic" %% "scalactic" % "3.0.5"
libraryDependencies += "org.scalatest" %% "scalatest" % "3.0.5" %
"test"
Run Code Online (Sandbox Code Playgroud)
Scala 指南建议您使用 Java8 进行编译:
我们建议使用 Java 8 编译 Scala 代码。由于 JVM 向后兼容,因此使用较新的 JVM 运行由旧 JVM 版本的 Scala 编译器编译的代码通常是安全的。
虽然这只是一个建议,但我发现它可以解决您提到的问题。
为了使用 Homebrew 安装 Java 8,最好使用jenv,它将帮助您在需要时处理多个 Java 版本。
brew install jenv
Run Code Online (Sandbox Code Playgroud)
然后运行以下命令添加 casks 替代版本的 Tap(存储库),因为 Java 8 不再是默认的 Tap:
brew tap homebrew/cask-versions
Run Code Online (Sandbox Code Playgroud)
安装 Java 8:
brew cask install homebrew/cask-versions/adoptopenjdk8
Run Code Online (Sandbox Code Playgroud)
运行以下命令将之前安装的 Java 版本添加到jenv版本列表中:
jenv add /Library/Java/JavaVirtualMachines/<installed_java_version>/Contents/Home
Run Code Online (Sandbox Code Playgroud)
终于跑了
jenv global 1.8
Run Code Online (Sandbox Code Playgroud)
或者
jenv local 1.8
Run Code Online (Sandbox Code Playgroud)
全局或本地(在当前文件夹中)使用 Java 1.8。
有关更多信息,请按照jenv 网站上的说明进行操作