小编min*_*don的帖子

spark streaming assertion failed:在轮询4096后,无法获得spark-executor-a-group a-topic 7 244723248的记录

Kafka DirectStream的Spark Streaming问题:

spark streaming assertion failed:在轮询4096后,无法获得spark-executor-a-group a-topic 7 244723248的记录

尝试:

1)调整增加spark.streaming.kafka.consumer.poll.ms

- 从512到4096,少失败,但即使是10s失败仍然存在

2)将执行程序内存从1G调整为2G

- 部分工作,更不用说失败了

3)https://issues.apache.org/jira/browse/SPARK-19275

- 当流媒体持续时间小于8秒时仍然失败("session.timeout.ms" - >"30000")

4)尝试Spark 2.1

- 问题仍然存在


使用Scala 2.11.8,Kafka版本:0.10.0.0,Spark版本:2.0.2

火花配置

.config("spark.cores.max", "4")
.config("spark.default.parallelism", "2")
.config("spark.streaming.backpressure.enabled", "true")
.config("spark.streaming.receiver.maxRate", "1024")
.config("spark.streaming.kafka.maxRatePerPartition", "256")
.config("spark.streaming.kafka.consumer.poll.ms", "4096")
.config("spark.streaming.concurrentJobs", "2")
Run Code Online (Sandbox Code Playgroud)

使用spark-streaming-kafka-0-10-assembly_2.11-2.1.0.jar

错误堆栈:

at scala.Predef$.assert(Predef.scala:170)
at org.apache.spark.streaming.kafka010.CachedKafkaConsumer.get(CachedKafkaConsumer.scala:74)
at org.apache.spark.streaming.kafka010.KafkaRDD$KafkaRDDIterator.next(KafkaRDD.scala:228)
at org.apache.spark.streaming.kafka010.KafkaRDD$KafkaRDDIterator.next(KafkaRDD.scala:194)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at org.apache.spark.streaming.kafka010.KafkaRDD$KafkaRDDIterator.foreach(KafkaRDD.scala:194)
...
at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$withSessionDo$1.apply(CassandraConnector.scala:109)
at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$withSessionDo$1.apply(CassandraConnector.scala:108)
at com.datastax.spark.connector.cql.CassandraConnector.closeResourceAfterUse(CassandraConnector.scala:142)
at com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:108)
...
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:925)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:925)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1944)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1944)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99) …
Run Code Online (Sandbox Code Playgroud)

apache-kafka apache-spark spark-streaming

6
推荐指数
1
解决办法
5184
查看次数