Jer*_*mps 5 java scala apache-spark
我有一个包含大小约为 10GB 的对象的大型 RDD。我想使用以下命令将其转换为要在 spark 中使用的查找表:
val lookupTable = sparkContext.broadcast(entitiesRDD.collect) 但它失败了:
17/02/27 17:33:25 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, d1): org.apache.spark.SparkException: Kryo serialization failed: Buffer overflow. Available: 0, required: 2. To avoid this, increase spark.kryoserializer.buffer.max value.
at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:299)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:240)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
我无法将 spark.kryoserializer.buffer.max 增加到 2048mb 以上,否则出现错误:
Caused by: java.lang.IllegalArgumentException: spark.kryoserializer.buffer.max must be less than 2048 mb, got: + 2048 mb.
at org.apache.spark.serializer.KryoSerializer.<init>(KryoSerializer.scala:66)
其他人如何在 spark 中制作大型查找表?
| 归档时间: |
|
| 查看次数: |
313 次 |
| 最近记录: |