Selector.select(timeout)在超时前返回0

aun*_*low 5 java nio

据Javadoc说,

它仅在选择了至少一个通道后才会返回,此选择器的唤醒方法被调用,当前线程被中断,或者给定的超时时间到期,以先到者为准.

但有时它会在没有任何这四种情况下返回:

  1. 至少选择了一个通道:它返回0
  2. 调用wakeup方法:wakeup不调用
  3. 当前线程被中断:Thread.interrupted()返回false
  4. 给定超时期限到期:根据日志未过期

更新2016-03-15

在第392行和第402行的源代码中,我添加了一些日志:https: //github.com/xqbase/tuna/blob/debug/core/src/main/java/com/xqbase/tuna/ConnectorImpl.java

public boolean doEvents(long timeout) {
    Log.v("Before Select: " + timeout);
    int keySize;
    try {
        keySize = timeout == 0 ? selector.selectNow() :
                timeout < 0 ? selector.select() : selector.select(timeout);
    } catch (IOException e) {
        throw new RuntimeException(e);
    }
    Set<SelectionKey> selectedKeys = selector.selectedKeys();
    if (keySize == 0) {
        Log.v("After Select(0): selectedKeys=" + selectedKeys.size() + ", " +
                "interrupt=" + Thread.interrupted());
        invokeQueue();
        return false;
    }

    for (SelectionKey key : selectedKeys) {
        ...
Run Code Online (Sandbox Code Playgroud)

这是日志:

...
2016-03-15 23:07:49.695 com.xqbase.tuna.ConnectorImpl doEvents
FINE: Before Select: 8120
2016-03-15 23:07:49.696 com.xqbase.tuna.ConnectorImpl doEvents
FINE: After Select(0): selectedKeys=0, interrupt=false
2016-03-15 23:07:49.696 com.xqbase.tuna.ConnectorImpl doEvents
FINE: Before Select: 8119
2016-03-15 23:07:49.696 com.xqbase.tuna.ConnectorImpl doEvents
FINE: After Select(0): selectedKeys=0, interrupt=false
2016-03-15 23:07:49.700 com.xqbase.tuna.ConnectorImpl doEvents
FINE: Before Select: 8115
2016-03-15 23:07:49.701 com.xqbase.tuna.ConnectorImpl doEvents
FINE: After Select(0): selectedKeys=0, interrupt=false
2016-03-15 23:07:49.701 com.xqbase.tuna.ConnectorImpl doEvents
FINE: Before Select: 8114
2016-03-15 23:07:49.702 com.xqbase.tuna.ConnectorImpl doEvents
FINE: After Select(0): selectedKeys=0, interrupt=false
...
Run Code Online (Sandbox Code Playgroud)

这很奇怪:没有选中的键,没有中断,没有超时和没有唤醒,但它返回了.

Java中有错误吗?我的Java版本是1.8.0_51-b16(64位服务器VM),运行在CentOS 6.5 x64 linode上.

aun*_*low 3

这可能确实是 JDK 中的一个错误。看来 Netty 和 Mina 也遇到了这样的问题,他们重建了选择器作为解决方法。

查看最新的Netty代码https://github.com/netty/netty/blob/4.1/transport/src/main/java/io/netty/channel/nio/NioEventLoop.java L641-681:

            if (selectedKeys != 0 || oldWakenUp || wakenUp.get() || hasTasks() || hasScheduledTasks()) {
                // - Selected something,
                // - waken up by user, or
                // - the task queue has a pending task.
                // - a scheduled task is ready for processing
                break;
            }
            ...
            } else if (SELECTOR_AUTO_REBUILD_THRESHOLD > 0 &&
                    selectCnt >= SELECTOR_AUTO_REBUILD_THRESHOLD) {
                // The selector returned prematurely many times in a row.
                // Rebuild the selector to work around the problem.
                logger.warn(
                        "Selector.select() returned prematurely {} times in a row; rebuilding selector.",
                        selectCnt);

                rebuildSelector();
                selector = this.selector;

                // Select again to populate selectedKeys.
                selector.selectNow();
                selectCnt = 1;
                break;
            }
Run Code Online (Sandbox Code Playgroud)

请参阅 Mina 2.0 代码https://github.com/apache/mina/blob/2.0/mina-core/src/main/java/org/apache/mina/core/polling/AbstractPollingIoProcessor.java L1070-1092:

                if (!wakeupCalled.getAndSet(false) && (selected == 0) && (delta < 100)) {
                    // Last chance : the select() may have been
                    // interrupted because we have had an closed channel.
                    if (isBrokenConnection()) {
                        LOG.warn("Broken connection");
                    } else {
                        LOG.warn("Create a new selector. Selected is 0, delta = " + (t1 - t0));
                        // Ok, we are hit by the nasty epoll
                        // spinning.
                        // Basically, there is a race condition
                        // which causes a closing file descriptor not to be
                        // considered as available as a selected channel,
                        // but
                        // it stopped the select. The next time we will
                        // call select(), it will exit immediately for the
                        // same
                        // reason, and do so forever, consuming 100%
                        // CPU.
                        // We have to destroy the selector, and
                        // register all the socket on a new one.
                        registerNewSelector();
                    }
                }
Run Code Online (Sandbox Code Playgroud)

因此,如果 select() 返回意外的零,则注册新选择器可能是最佳实践。