从 IntelliJ IDEA Spark 驱动 K8S 集群,无需构建 JAR

Tim*_*ler 4 apache-spark kubernetes

package learn.spark

import org.apache.spark.{SparkConf, SparkContext}

object MasterLocal2 {
  def main(args: Array[String]): Unit = {

    val conf = new SparkConf()

    conf.setAppName("spark-k8s")
    conf.setMaster("k8s://https://192.168.99.100:16443")

    conf.set("spark.driver.host", "192.168.99.1")
    conf.set("spark.executor.instances", "5")
    conf.set("spark.kubernetes.executor.request.cores", "0.1")
    conf.set("spark.kubernetes.container.image", "spark:latest")

    val sc = new SparkContext(conf)

    println(sc.parallelize(1 to 5).map(_ * 10).collect().mkString(", "))

    sc.stop()
  }
}
Run Code Online (Sandbox Code Playgroud)

我试图加快 Spark 程序的本地运行速度,但出现了一些异常。我不知道如何配置将 JVM 的东西传递给执行器。

Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 8, 10.1.1.217, executor 4): java.lang.ClassNotFoundException: learn.spark.MasterLocal2$$anonfun$main$1
    at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:348)
Run Code Online (Sandbox Code Playgroud)

Mar*_*ett 5

将Idea编译结果目录挂载到执行器上,然后设置spark.executor.extraClassPath为该目录。

conf.set("spark.kubernetes.executor.volumes.hostPath.anyname.options.path", "/path/to/your/project/out/production/examples")
conf.set("spark.kubernetes.executor.volumes.hostPath.anyname.mount.path", "/intellij-idea-build-out")
conf.set("spark.executor.extraClassPath", "/intellij-idea-build-out")
Run Code Online (Sandbox Code Playgroud)

确保你的编译出目录可以通过K8S Volume挂载到执行器容器,这涉及到 Kubernetes 的使用。