尝试将JDBC DataFrame加载到Spark SQL时,我遇到了非常奇怪的问题.
我在我的笔记本电脑上尝试了几个Spark集群 - YARN,独立集群和伪分布式模式.它在Spark 1.3.0和1.3.1上都是可重现的.spark-shell在执行代码时和使用时都会出现问题spark-submit.我试过MySQL和MS SQL JDBC驱动程序但没有成功.
考虑以下示例:
val driver = "com.mysql.jdbc.Driver"
val url = "jdbc:mysql://localhost:3306/test"
val t1 = {
sqlContext.load("jdbc", Map(
"url" -> url,
"driver" -> driver,
"dbtable" -> "t1",
"partitionColumn" -> "id",
"lowerBound" -> "0",
"upperBound" -> "100",
"numPartitions" -> "50"
))
}
Run Code Online (Sandbox Code Playgroud)
到目前为止,架构得到了正确解决:
t1: org.apache.spark.sql.DataFrame = [id: int, name: string]
Run Code Online (Sandbox Code Playgroud)
但是当我评估DataFrame时:
t1.take(1)
Run Code Online (Sandbox Code Playgroud)
发生以下异常:
15/04/29 01:56:44 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 192.168.1.42): java.sql.SQLException: No suitable driver found …Run Code Online (Sandbox Code Playgroud)