Flink 作业失败并显示“检查点协调器正在挂起”。

Dam*_*mon 5 apache-flink flink-streaming

我运行了一个 flink 作业,18 小时后失败了。失败消息:检查点协调器正在挂起。

检查点截图: 检查点截图

工作概览截图: 工作概览截图

这是作业管理器日志:

2020-10-10 13:53:10,636 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: kafkaSource -> Timestamps/Watermarks -> Process (1/1) (c38419ece8208c1ef2948087f2b84dd0) switched from RUNNING to FAILED.
java.lang.Exception: Could not perform checkpoint 3265 for operator Source: kafkaSource -> Timestamps/Watermarks -> Process (1/1).
    at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpoint(StreamTask.java:785)
    at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$triggerCheckpointAsync$3(StreamTask.java:760)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.run(StreamTaskActionExecutor.java:87)
    at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:78)
    at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:261)
    at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:186)
    at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:485)
    at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:469)
    at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:708)
    at org.apache.flink.runtime.taskmanager.Task.run(Task.java:533)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
    at org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.executeCheckpointing(StreamTask.java:1394)
    at org.apache.flink.streaming.runtime.tasks.StreamTask.checkpointState(StreamTask.java:974)
    at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$performCheckpoint$5(StreamTask.java:870)
    at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:94)
    at org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:843)
    at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpoint(StreamTask.java:776)
    ... 11 more
Run Code Online (Sandbox Code Playgroud)

我不知道是什么原因导致这个问题。

小智 0

当您的应用程序尝试检查点时,可能会发生这种情况,而此时检查点协调器(作业管理器)由于某种原因关闭,并且检查点无法完成。关闭的原因可能有多种,例如,您开始了新的部署、取消了作业、作业由于某些运行时异常而必须退出等。