相关疑难解决方法(0)

为什么Spark应用程序失败并出现"ClassNotFoundException:找不到数据源:kafka"作为带有sbt程序集的uber-jar?

我正在尝试运行像https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/sql/streaming/StructuredKafkaWordCount.scala这样的示例.我从http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html上的Spark Structured Streaming Programming指南开始.

我的代码是

package io.boontadata.spark.job1

import org.apache.spark.sql.SparkSession

object DirectKafkaAggregateEvents {
  val FIELD_MESSAGE_ID = 0
  val FIELD_DEVICE_ID = 1
  val FIELD_TIMESTAMP = 2
  val FIELD_CATEGORY = 3
  val FIELD_MEASURE1 = 4
  val FIELD_MEASURE2 = 5

  def main(args: Array[String]) {
    if (args.length < 3) {
      System.err.println(s"""
        |Usage: DirectKafkaAggregateEvents <brokers> <subscribeType> <topics>
        |  <brokers> is a list of one or more Kafka brokers
        |  <subscribeType> sample value: subscribe
        |  <topics> is a list of one or more …
Run Code Online (Sandbox Code Playgroud)

scala sbt sbt-assembly apache-spark spark-structured-streaming

21
推荐指数
2
解决办法
2万
查看次数

为什么 Spark 应用程序会失败,并显示“ClassNotFoundException:无法找到数据源:jdbc”作为带有 sbt 程序集的 uber-jar?

我正在尝试使用 sbt 1.0.4 和 sbt- assembly 0.14.6 来组装 Spark 应用程序。

Spark 应用程序在 IntelliJ IDEA 或 中启动时工作正常spark-submit,但如果我使用命令行(Windows 10 中的 cmd)运行组装的 uber-jar:

java -Xmx1024m -jar my-app.jar
Run Code Online (Sandbox Code Playgroud)

我得到以下异常:

线程“main”中出现异常 java.lang.ClassNotFoundException:找不到数据源:jdbc。请在http://spark.apache.org/third-party-projects.html找到软件包

Spark 应用程序如下所示。

package spark.main

import java.util.Properties    
import org.apache.spark.sql.SparkSession

object Main {

    def main(args: Array[String]) {
        val connectionProperties = new Properties()
        connectionProperties.put("user","postgres")
        connectionProperties.put("password","postgres")
        connectionProperties.put("driver", "org.postgresql.Driver")

        val testTable = "test_tbl"

        val spark = SparkSession.builder()
            .appName("Postgres Test")
            .master("local[*]")
            .config("spark.hadoop.fs.file.impl", classOf[org.apache.hadoop.fs.LocalFileSystem].getName)
            .config("spark.sql.warehouse.dir", System.getProperty("java.io.tmpdir") + "swd")
            .getOrCreate()

        val dfPg = spark.sqlContext.read.
            jdbc("jdbc:postgresql://localhost/testdb",testTable,connectionProperties)

        dfPg.show()
    } …
Run Code Online (Sandbox Code Playgroud)

scala sbt sbt-assembly apache-spark apache-spark-sql

5
推荐指数
1
解决办法
3191
查看次数

Spark 2.3.0无法找到数据源:kafka

我正在尝试使用CSV设置Kafka流,以便可以将其流式传输到Spark。但是,我不断

Exception in thread "main" java.lang.ClassNotFoundException: Failed to find data source: kafka. Please find packages at http://spark.apache.org/third-party-projects.html
Run Code Online (Sandbox Code Playgroud)

我的代码看起来像这样

import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions._
import org.apache.log4j.{Level, Logger}
import org.apache.spark.sql.execution.streaming.FileStreamSource.Timestamp
import org.apache.spark.sql.types._

object SpeedTester {
  def main(args: Array[String]): Unit = {

  val spark = SparkSession.builder.master("local[4]").appName("SpeedTester").config("spark.driver.memory", "8g").getOrCreate()
  val rootLogger = Logger.getRootLogger()
  rootLogger.setLevel(Level.ERROR)
  import spark.implicits._
  val mySchema = StructType(Array(
    StructField("incident_id", IntegerType),
    StructField("date", StringType),
    StructField("state", StringType),
    StructField("city_or_county", StringType),
    StructField("n_killed", IntegerType),
    StructField("n_injured", IntegerType)
  ))

  val streamingDataFrame = spark.readStream.schema(mySchema).csv("C:/Users/zoldham/IdeaProjects/flinkpoc/Data/test")
  streamingDataFrame.selectExpr("CAST(incident_id AS STRING) AS key",
  "to_json(struct(*)) AS value").writeStream
    .format("kafka") …
Run Code Online (Sandbox Code Playgroud)

scala apache-kafka apache-spark spark-structured-streaming

2
推荐指数
1
解决办法
7944
查看次数