kul*_*ula 6 spark-streaming pyspark
我使用python的spark-streaming 2.2.0.并从kafka(2.11-0.10.0.0)集群中读取数据.我提交了一个python脚本与spark-submit --jars spark-streaming-kafka-0-8-assembly_2.11-2.2.0.jar hodor.py spark报告错误信息
17/08/04 10:52:00 ERROR Utils: Uncaught exception in thread stdout
writer for python
java.lang.NoSuchMethodError: net.jpountz.util.Utils.checkRange([BII)V
at org.apache.kafka.common.message.KafkaLZ4BlockInputStream.read(KafkaLZ4BlockInputStream.java:176)
at java.io.FilterInputStream.read(FilterInputStream.java:107)
at kafka.message.ByteBufferMessageSet$$anonfun$decompress$1.apply$mcI$sp(ByteBufferMessageSet.scala:67)
at kafka.message.ByteBufferMessageSet$$anonfun$decompress$1.apply(ByteBufferMessageSet.scala:67)
at kafka.message.ByteBufferMessageSet$$anonfun$decompress$1.apply(ByteBufferMessageSet.scala:67)
at scala.collection.immutable.Stream$.continually(Stream.scala:1279)
at kafka.message.ByteBufferMessageSet$.decompress(ByteBufferMessageSet.scala:67)
at kafka.message.ByteBufferMessageSet$$anon$1.makeNextOuter(ByteBufferMessageSet.scala:179)
at kafka.message.ByteBufferMessageSet$$anon$1.makeNext(ByteBufferMessageSet.scala:192)
at kafka.message.ByteBufferMessageSet$$anon$1.makeNext(ByteBufferMessageSet.scala:146)
at kafka.utils.IteratorTemplate.maybeComputeNext(IteratorTemplate.scala:66)
at kafka.utils.IteratorTemplate.hasNext(IteratorTemplate.scala:58)
at scala.collection.Iterator$$anon$18.hasNext(Iterator.scala:764)
at org.apache.spark.streaming.kafka.KafkaRDD$KafkaRDDIterator.getNext(KafkaRDD.scala:214)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at org.apache.spark.util.NextIterator.foreach(NextIterator.scala:21)
at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:509)
at org.apache.spark.api.python.PythonRunner$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:333)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1954)
at org.apache.spark.api.python.PythonRunner$WriterThread.run(PythonRDD.scala:269)
Run Code Online (Sandbox Code Playgroud)
我认为它可能是由lz4版本冲突引起的.spark取决于net.jpountz.lz4 1.3.0但kafka依赖于net.jpountz.lz4 1.2.0
我该怎么办呢?
升级到 spark 2.3.0 时我遇到了同样的问题(使用结构化流连接两个流)
这是错误:
NoSuchMethodError: net.jpountz.lz4.LZ4BlockInputStream.init<>
我正在使用 Maven,所以解决方案是将两者中最旧的依赖项(spark 2.3 使用 1.4.0,kafka 使用 1.3.0)添加到我的主项目的pom.xml。
<dependency>
<groupId>net.jpountz.lz4</groupId>
<artifactId>lz4</artifactId>
<version>1.3.0</version>
</dependency>
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
3306 次 |
| 最近记录: |