spark-sql 卡在 BlockManagerInfo:计算 sql 语句时添加了 broadcast_0_piece0

Lor*_*hen 5 apache-spark apache-spark-sql

输出为这个并卡在最后一行:

17/09/07 06:01:35 INFO ClientCnxn: Socket connection established to 10.0.0.193/10.0.0.193:2181, initiating session  
17/09/07 06:01:35 INFO ClientCnxn: Session establishment complete on server 10.0.0.193/10.0.0.193:2181, sessionid = 0x15e4bc9518103cc, negotiated timeout = 40000  
17/09/07 06:01:35 INFO RegionSizeCalculator: **Calculating region sizes for table "event_data".**  
17/09/07 06:01:35 INFO SparkContext: Starting job: processCmd at CliDriver.java:376  
17/09/07 06:01:36 INFO DAGScheduler: Got job 0 (processCmd at CliDriver.java:376) with 1 output partitions  
17/09/07 06:01:36 INFO DAGScheduler: Final stage: ResultStage 0 (processCmd at CliDriver.java:376)  
17/09/07 06:01:36 INFO DAGScheduler: Parents of final stage: List()  
17/09/07 06:01:36 INFO DAGScheduler: Missing parents: List()  
17/09/07 06:01:36 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[4] at processCmd at CliDriver.java:376), which has no missing parents  
17/09/07 06:01:36 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 16.6 KB, free 414.1 MB)  
17/09/07 06:01:36 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 8.8 KB, free 414.1 MB)  
17/09/07 06:01:36 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 10.0.0.199:43329 (size: 8.8 KB, free: 414.4 MB)  
17/09/07 06:01:36 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1007  
17/09/07 06:01:36 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[4] at processCmd at CliDriver.java:376) (first 15 tasks are for partitions Vector(0))  
17/09/07 06:01:36 INFO YarnScheduler: Adding task set 0.0 with 1 tasks  
17/09/07 06:01:37 INFO ExecutorAllocationManager: Requesting 1 new executor because tasks are backlogged (new desired total will be 1)  
17/09/07 06:01:42 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.0.0.248:55616) with ID 1  
17/09/07 06:01:42 INFO ExecutorAllocationManager: New executor 1 has registered (new total is 1)  
17/09/07 06:01:42 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, ip-10-0-0-248.cn-north-1.compute.internal, executor 1, partition 0, RACK_LOCAL, 5053 bytes)  
17/09/07 06:01:42 INFO BlockManagerMasterEndpoint: Registering block manager ip-10-0-0-248.cn-north-1.compute.internal:34192 with 2.8 GB RAM, BlockManagerId(1, ip-10-0-0-248.cn-north-1.compute.internal, 34192, None)  
17/09/07 06:01:42 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on ip-10-0-0-248.cn-north-1.compute.internal:34192 (size: 8.8 KB, free: 2.8 GB)  
17/09/07 06:01:42 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-10-0-0-248.cn-north-1.compute.internal:34192 (size: 28.8 KB, free: 2.8 GB)  
Run Code Online (Sandbox Code Playgroud)

Spark SQL 连接到 hive 和名称为的表,event_data它是存储在 hbase 中的外部表。
此外,如果我操作 hive 表(不是来自 hbase),例如select count(*) from mytest01.

有时会卡在BlockManagerInfo: Removed

17/09/07 06:31:18 INFO ContextCleaner: Cleaned accumulator 1  
17/09/07 06:31:18 INFO BlockManagerInfo: Removed broadcast_0_piece0 on 10.0.0.199:43329 in memory (size: 28.8 KB, free: 414.4 MB)  
17/09/07 06:31:18 INFO BlockManagerInfo: Removed broadcast_0_piece0 on ip-10-0-0-248.cn-north-1.compute.internal:34192 in memory (size: 28.8 KB, free: 2.8 GB)  
Run Code Online (Sandbox Code Playgroud)

如何解决这个问题呢?谢谢。

小智 0

当您执行 Spark 提交时,请使用以下标志:

--driver-memory 4g --executor-memory 6g
Run Code Online (Sandbox Code Playgroud)

将其粘贴到“spark-submit”之后