在Scala REPL中运行spark时出错 - 访问被拒绝org.apache.derby.security.SystemPermission("engine","usederbyinternals")

jam*_*iet 2 scala sbt apache-spark

我一直在使用IntelliJ来快速使用sbt在Scala中开发Spark应用程序.虽然IntelliJ隐藏了很多脚手架,但是我想了解基本知识,所以我想尝试从命令行中获取并运行(即使用REPL).我正在使用macOS.

这就是我所做的:

mkdir -p ~/tmp/scalasparkrepl
cd !$
echo 'scalaVersion := "2.11.12"' > build.sbt
echo 'libraryDependencies += "org.apache.spark" %% "spark-core" % "2.3.0"' >> build.sbt
echo 'libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.3.0"' >> build.sbt
echo 'libraryDependencies += "org.apache.spark" %% "spark-hive" % "2.3.0"' >> build.sbt
sbt console
Run Code Online (Sandbox Code Playgroud)

这将打开我运行的scala REPL(包括下载所有依赖项):

import org.apache.spark.SparkConf
import org.apache.spark.sql.{SparkSession, DataFrame}
val conf = new SparkConf().setMaster("local[*]")
val spark = SparkSession.builder().appName("spark repl").config(conf).config("spark.sql.warehouse.dir", "~/tmp/scalasparkreplhive").enableHiveSupport().getOrCreate()
spark.range(0, 1000).toDF()
Run Code Online (Sandbox Code Playgroud)

失败并出现错误access denied org.apache.derby.security.SystemPermission( "engine", "usederbyinternals" ):

scala> spark.range(0, 1000).toDF()
18/05/08 11:51:11 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('~/tmp/scalasparkreplhive').
18/05/08 11:51:11 INFO SharedState: Warehouse path is '/tmp/scalasparkreplhive'.
18/05/08 11:51:12 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
18/05/08 11:51:12 INFO HiveUtils: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
18/05/08 11:51:12 INFO HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
18/05/08 11:51:12 INFO ObjectStore: ObjectStore, initialize called
18/05/08 11:51:13 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
18/05/08 11:51:13 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored
java.security.AccessControlException: access denied org.apache.derby.security.SystemPermission( "engine", "usederbyinternals" )
Run Code Online (Sandbox Code Playgroud)

我已经google了一下,有一些关于这个错误的信息,但我没有用它来解决它.我觉得奇怪的是命令行上的scala/sbt项目会出现这个问题,而IntelliJ中的sbt项目工作正常(我几乎复制/粘贴了IntelliJ项目中的代码).我想IntelliJ代表我做了一些事情,但我不知道是什么,这就是我正在进行这项练习的原因.

任何人都可以建议如何解决这个问题?

JGC*_*JGC 6

不会为此充分肯定,但它看起来类似于SBT测试不适用于火花测试

解决方案是在运行Scala代码之前发出此行:

System.setSecurityManager(null)
Run Code Online (Sandbox Code Playgroud)

所以完整:

System.setSecurityManager(null)
import org.apache.spark.SparkConf
import org.apache.spark.sql.{SparkSession, DataFrame}
val conf = new SparkConf().setMaster("local[*]")
val spark = SparkSession.builder().appName("spark repl").config(conf).config("spark.sql.warehouse.dir", "~/tmp/scalasparkreplhive").enableHiveSupport().getOrCreate()
spark.range(0, 1000).toDF()
Run Code Online (Sandbox Code Playgroud)