我是 Spark 开发的新手,并尝试在 redhat linux 环境中使用 sbt 构建我的第一个 spark2(scala) 应用程序。下面是环境细节。
CDH Version: 5.11.0
Apache Spark2: 2.1.0.cloudera1
Scala Version: 2.11.11
Java Version: 1.7.0_101
Run Code Online (Sandbox Code Playgroud)
应用代码:
import org.apache.spark.sql
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.types._
import org.apache.spark.sql
object MySample {
def main(args: Array[String]) {
val warehouseLocation = "file:${system:user.dir}/spark-warehouse"
val spark = SparkSession
.builder()
.appName("FirstApplication")
.config("spark.sql.warehouse.dir", warehouseLocation)
.getOrCreate()
val schPer = new StructType(Array(
new StructField("Column1",IntegerType,false),
new StructField("Column2",StringType,true),
new StructField("Column3",StringType,true),
new StructField("Column4",IntegerType,true)
))
val dfPeriod = spark.read.format("csv").option("header",false).schema(schPer).load("/prakash/periodFiles/")
dfPeriod.write.format("csv").save("/prakash/output/dfPeriod")
}
Run Code Online (Sandbox Code Playgroud)
}
使用 sbt 编译时出现以下错误。
$ sbt
[info] Loading project …Run Code Online (Sandbox Code Playgroud)