pup*_*pet 21 scala jar classpath apache-spark
使用这个简单的例子,我遇到了"ClassNotFound"异常的问题:
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import java.net.URLClassLoader
import scala.util.Marshal
class ClassToRoundTrip(val id: Int) extends scala.Serializable {
}
object RoundTripTester {
def test(id : Int) : ClassToRoundTrip = {
// Get the current classpath and output. Can we see simpleapp jar?
val cl = ClassLoader.getSystemClassLoader
val urls = cl.asInstanceOf[URLClassLoader].getURLs
urls.foreach(url => println("Executor classpath is:" + url.getFile))
// Simply instantiating an instance of object and using it works fine.
val testObj = new ClassToRoundTrip(id)
println("testObj.id: " + testObj.id)
val testObjBytes = Marshal.dump(testObj)
val testObjRoundTrip = Marshal.load[ClassToRoundTrip](testObjBytes) // <<-- ClassNotFoundException here
testObjRoundTrip
}
}
object SimpleApp {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("Simple Application")
val sc = new SparkContext(conf)
val cl = ClassLoader.getSystemClassLoader
val urls = cl.asInstanceOf[URLClassLoader].getURLs
urls.foreach(url => println("Driver classpath is: " + url.getFile))
val data = Array(1, 2, 3, 4, 5)
val distData = sc.parallelize(data)
distData.foreach(x=> RoundTripTester.test(x))
}
}
Run Code Online (Sandbox Code Playgroud)
在本地模式下,根据文档提交会在第31行生成"ClassNotFound"异常,其中ClassToRoundTrip对象被反序列化.奇怪的是,第28行的早期使用是可以的:
spark-submit --class "SimpleApp" \
--master local[4] \
target/scala-2.10/simpleapp_2.10-1.0.jar
Run Code Online (Sandbox Code Playgroud)
但是,如果我为"driver-class-path"和"-jars"添加额外的参数,它在本地工作正常.
spark-submit --class "SimpleApp" \
--master local[4] \
--driver-class-path /home/xxxxxxx/workspace/SimpleApp/target/scala-2.10/simpleapp_2.10-1.0.jar \
--jars /home/xxxxxxx/workspace/SimpleApp/target/scala-2.10/SimpleApp.jar \
target/scala-2.10/simpleapp_2.10-1.0.jar
Run Code Online (Sandbox Code Playgroud)
但是,提交给本地开发人员仍然会生成相同的问题:
spark-submit --class "SimpleApp" \
--master spark://localhost.localdomain:7077 \
--driver-class-path /home/xxxxxxx/workspace/SimpleApp/target/scala-2.10/simpleapp_2.10-1.0.jar \
--jars /home/xxxxxxx/workspace/SimpleApp/target/scala-2.10/simpleapp_2.10-1.0.jar \
target/scala-2.10/simpleapp_2.10-1.0.jar
Run Code Online (Sandbox Code Playgroud)
我可以从输出中看到执行程序正在获取JAR文件.
其中一个执行者的日志在这里:
stdout:http://pastebin.com/raw.php? i = DQvvGhKm
stderr:http://pastebin.com/raw.php? i = MPZZVa0Q
我正在使用Spark 1.0.2.ClassToRoundTrip包含在JAR中.我宁愿不必在SPARK_CLASSPATH或SparkContext.addJar中硬编码值.有人可以帮忙吗?
小智 15
我有同样的问题.如果master是本地的,那么程序对大多数人来说运行良好.如果他们把它设置为(也发生在我身上)"spark:// myurl:7077"如果不起作用.大多数人都会收到错误,因为在执行过程中找不到匿名类.它通过使用SparkContext.addJars("路径到jar")来解决.
确保您正在做以下事情: -
注意:最后一点的jar jar pathToYourJar/target/yourJarFromMaven.jar也在代码中设置,如此答案的第一点.
| 归档时间: |
|
| 查看次数: |
43254 次 |
| 最近记录: |