Spark:找不到CoarseGrainedScheduler

Ade*_*nde 13 scala apache-spark

运行几个小时后,我不确定导致此异常运行我的Spark作业的原因是什么.

我正在运行Spark 2.0.2

有没有调试提示?

2016-12-27 03:11:22,199 [shuffle-server-3] ERROR org.apache.spark.network.server.TransportRequestHandler - Error while invoking RpcHandler#receive() for one-way message.
org.apache.spark.SparkException: Could not find CoarseGrainedScheduler.
    at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:154)
    at org.apache.spark.rpc.netty.Dispatcher.postOneWayMessage(Dispatcher.scala:134)
    at org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:571)
    at org.apache.spark.network.server.TransportRequestHandler.processOneWayMessage(TransportRequestHandler.java:180)
    at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:109)
    at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:119)
    at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)
    at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
    at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
    at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
    at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:85)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
    at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
    at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
    at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEve
Run Code Online (Sandbox Code Playgroud)

Ade*_*nde 10

是的,现在我知道那个神秘异常的含义,执行者因为超过容器阈值而被杀死.
有几个原因可能会发生,但第一个罪魁祸首是检查您的工作或尝试向您的群集添加更多节点/执行程序.

  • 对不起,这不是一个好的答案,特别是对于Google搜索结果第一页中出现的问题."检查你的工作"是为了什么?事实证明,这个例外是一个红鲱鱼; 正如*Tomer*所说,真正的异常在日志中更高. (9认同)
  • 这几个原因是什么?如果我没有增加节点/资源,那么作业运行缓慢怎么办?您能否提供更详细的解释,也许可以分享参考链接? (5认同)

Tom*_*mer 8

基本上它意味着失败还有另一个原因.尝试在作业日志中查找其他异常.

请参阅此处的"例外"部分:https: //medium.com/@wx.london.cun/spark-on-yarn-f74e82ab6070


Ben*_*zzo 5

可能是资源问题。尝试增加内核和执行器的数量,并为应用程序分配更多的 RAM,然后您应该通过调用重新分区来增加 RDD 的分区数。理想的分区数取决于之前的设置。希望这可以帮助。