Aka*_*thi 6 cassandra cassandra-2.0 apache-spark spark-cassandra-connector
我为Spark设置了这些配置,但每次我正在阅读或写入Cassandra表我都会得到 ioException
.setMaster(sparkIp)
.set("spark.cassandra.connection.host", cassandraIp)
.set("spark.sql.crossJoin.enabled", "true")
.set("spark.executor.memory", sparkExecutorMemory) //**26 GB**
.set("spark.executor.cores", sparkExecutorCore) // **from 4 to 8**
.set("spark.executor.instances", sparkExecutorInstances) // 1
.set("spark.cassandra.output.batch.size.bytes", "2048")
.set("spark.sql.broadcastTimeout", "2000")
.set("spark.sql.shuffle.partitions", "1000")
.set("spark.network.timeout", "80s")
.set("spark.executor.extraJavaOptions", "-verbose:gc -XX:+UseG1GC")
Run Code Online (Sandbox Code Playgroud)
sc.cassandraTableMyCaseClass //阅读代码
dataRDD..saveToCassandra("myDatabase","mytable")//编写代码
表中的数据量很大,操作也很复杂.
我正在使用带有28gb内存和8个内核的火花大师和10个具有相同配置的火花工作者,其中我使用的是26 gb内存和4到8个内核.有时我也会得到ExecutorLostException.
在Cassandra表中写入数据时的最新StackTrace
org.apache.spark.SparkException: Job aborted due to stage failure: Task 145 in stage 6.0 failed 4 times, most recent failure: Lost task 145.6 in stage 6.0 (TID 3268, 10.178.149.48): ExecutorLostFailure (executor 157 exited caused by one of the running tasks) Reason: Executor heartbeat timed out after 118434 ms
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1450)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1438)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1437)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1437)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:811)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1659)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1618)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1607)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
Run Code Online (Sandbox Code Playgroud)
先感谢您
| 归档时间: |
|
| 查看次数: |
138 次 |
| 最近记录: |