我正在构建一个简单的gen_server模块,它监视多个远程节点的活动
当远程节点注册时,该模块使用erlang监视节点:monitor_node(Node,true).每个节点仅注册一次(使用日志确认)
并且在gen_server的handle_info/2回调中,它捕获{nodedown,Node}消息并使用erlang:monitor_node(Node,false)对该节点进行恶魔化.我希望只收到一次此消息:远程节点关闭时.
当我测试模块时,我发现当远程节点出现故障时,会向gen_server发送数百条{nodedown,Node}消息(数量从几百到几千不等).
为什么monitor_node发送了多条消息?我该如何防止这种行为?
编辑:这是(部分)源代码
register_node(#node_info{node = NodeName} = NodeInfo) ->
case mnesia:read(node_info, NodeName) of
[] ->
monitor_node(NodeName, true),
error_logger:info_msg("node ~p registered", [NodeName]);
[_OldInfo] ->
error_logger:trace_msg("info of node ~p updated", [NodeName])
end,
mnesia:write(NodeInfo).
handle_cast({register_node, #node_info{} = NodeStatus}, Timer) ->
case mnesia:transaction(fun register_node/1, [NodeStatus]) of
{aborted, Reason} ->
error_logger:warning_msg("transaction register_node failed: ~p", [Reason]);
_ ->
ok
end,
{noreply, Timer};
handle_cast({shutdown_node, #node_info{} = NodeStatus}, Timer) ->
case mnesia:dirty_delete_object(NodeStatus) of
{aborted, Reason} ->
error_logger:warning_msg("transaction shutdown_node failed: ~p", [Reason]);
_ ->
ok
end,
{noreply, Timer};
handle_cast(Message, Timer) ->
error_logger:warning_msg("~p: received unknown message ~p", [?MODULE, Message]),
{noreply, Timer}.
handle_info({nodedown, Node}, Timer) ->
monitor_node(Node, false),
error_logger:info_msg("~p: node ~p down", [?MODULE, Node]),
mnesia:transaction(fun mnesia:delete/3, [node_info, Node, write]),
{noreply, Timer};
handle_info(Message, Timer) ->
error_logger:warning_msg("~p: received unknown message ~p", [?MODULE, Message]),
{noreply, Timer}.
Run Code Online (Sandbox Code Playgroud)
你已经完成monitor_node(NodeName, true) **INSIDE**了mnesia交易.
我认为因为monitor_node将在内部涉及(I/O操作)消息通信.把这条线放在转换中是不合适的.它可能会向'registered'相关节点发送消息.因此,当节点关闭时,'nodedown'已收到消息的处理.
If a process has made two calls to monitor_node(Node, true) and Node terminates,
**two nodedown messages are delivered to the process.** If there is no connection
to Node, there will be an attempt to create one. If this fails, a nodedown
message is delivered.
Run Code Online (Sandbox Code Playgroud)
请移出该行transaction或仅使用"CASE"表达式,然后重试.
register_node(#node_info{node = NodeName} = NodeInfo) ->
case mnesia:read(node_info, NodeName) of
[] ->
monitor_node(NodeName, true),
error_logger:info_msg("node ~p registered", [NodeName]);
[_OldInfo] ->
error_logger:trace_msg("info of node ~p updated", [NodeName])
end,
mnesia:write(NodeInfo).
handle_cast({register_node, #node_info{} = NodeStatus}, Timer) ->
case mnesia:transaction(fun register_node/1, [NodeStatus]) of
{aborted, Reason} ->
error_logger:warning_msg("transaction register_node failed: ~p", [Reason]);
_ ->
ok
end,
{noreply, Timer};
Run Code Online (Sandbox Code Playgroud)
Mnesia在事务执行时动态设置和释放锁,因此,执行具有事务副作用的代码非常危险.特别是,事务中的receive语句可能导致事务挂起并且永不返回的情况, 这反过来又会导致锁不释放.这种情况可能使整个系统停滞不前,因为在其他进程或其他节点上执行的其他事务被迫等待有缺陷的事务.