Aav*_*vik 3 scala apache-kafka apache-spark spark-streaming-kafka
感谢您使用spark 2.0.2运行火花流程序的帮助.
运行错误"java.lang.ClassNotFoundException: Failed to find data source: kafka".修改后的POM文件如下.
正在创建Spark,但是在调用来自kafka的负载时出现错误.
创建火花会话:
val spark = SparkSession
.builder()
.master(master)
.appName("Apache Log Analyzer Streaming from Kafka")
.config("hive.metastore.warehouse.dir", hiveWarehouse)
.config("fs.defaultFS", hdfs_FS)
.enableHiveSupport()
.getOrCreate()
Run Code Online (Sandbox Code Playgroud)
创建kafka流媒体:
val logLinesDStream = spark
.readStream
.format("kafka")
.option("kafka.bootstrap.servers", "localhost:2181")
.option("subscribe", topics)
.load()
Run Code Online (Sandbox Code Playgroud)
错误信息:
Exception in thread "main" java.lang.ClassNotFoundException: Failed to find data source: kafka. Please find packages at http://spark-packages.org
Run Code Online (Sandbox Code Playgroud)
pom.xml中:
<scala.version>2.10.4</scala.version>
<scala.compat.version>2.10</scala.compat.version>
<spark.version>2.0.2</spark.version>
<dependencies>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.10</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka-0-10_2.10</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.10</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>2.7.2</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.7.2</version>
</dependency>
<dependency>
<groupId>commons-io</groupId>
<artifactId>commons-io</artifactId>
<version>2.5</version>
</dependency>
</dependencies>
Run Code Online (Sandbox Code Playgroud)
当你真正需要2.0.2时,你正在引用Spark的Kafka v1.5.1参考.您还需要使用sql-kafka结构化流:
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql-kafka-0-10_2.10</artifactId>
<version>2.0.2</version>
Run Code Online (Sandbox Code Playgroud)
请注意,仅Kafka> = 0.10支持SparkSession API