检测套接字挂断而不发送或接收?

Mat*_*ner 25 c python sockets linux tcp

我正在编写一个TCP服务器,可能需要15秒或更长时间才能开始生成对某些请求的响应体.如果响应需要几秒钟才能完成,一些客户端喜欢在最后关闭连接.

由于生成响应非常占用CPU,因此我宁愿在客户端关闭连接的瞬间暂停任务.目前,在发送第一个有效负载并收到各种挂起错误之前,我没有发现这一点.

如何在不发送或接收任何数据的情况下检测到对等方已关闭连接?这意味着recv所有数据都保留在内核中,或者send实际上没有数据传输.

Bla*_*air 26

选择模块包含你所需要的.如果您只需要Linux支持并拥有足够新的内核,那么select.epoll()应该为您提供所需的信息.大多数Unix系统都会支持select.poll().

如果您需要跨平台支持,标准方法是使用select.select()检查套接字是否标记为具有可读取的数据.如果是,但recv()返回零字节,另一端挂断.

我总是发现Beej的网络编程指南很好(注意它是为C编写的,但通常适用于标准套接字操作),而Socket Programming How-To有一个不错的Python概述.

编辑:以下是一个示例,说明如何编写一个简单的服务器来排队传入的命令,但一旦发现连接已在远程端关闭就退出处理.

import select
import socket
import time

# Create the server.
serversocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
serversocket.bind((socket.gethostname(), 7557))
serversocket.listen(1)

# Wait for an incoming connection.
clientsocket, address = serversocket.accept()
print 'Connection from', address[0]

# Control variables.
queue = []
cancelled = False

while True:
    # If nothing queued, wait for incoming request.
    if not queue:
        queue.append(clientsocket.recv(1024))

    # Receive data of length zero ==> connection closed.
    if len(queue[0]) == 0:
        break

    # Get the next request and remove the trailing newline.
    request = queue.pop(0)[:-1]
    print 'Starting request', request

    # Main processing loop.
    for i in xrange(15):
        # Do some of the processing.
        time.sleep(1.0)

        # See if the socket is marked as having data ready.
        r, w, e = select.select((clientsocket,), (), (), 0)
        if r:
            data = clientsocket.recv(1024)

            # Length of zero ==> connection closed.
            if len(data) == 0:
                cancelled = True
                break

            # Add this request to the queue.
            queue.append(data)
            print 'Queueing request', data[:-1]

    # Request was cancelled.
    if cancelled:
        print 'Request cancelled.'
        break

    # Done with this request.
    print 'Request finished.'

# If we got here, the connection was closed.
print 'Connection closed.'
serversocket.close()
Run Code Online (Sandbox Code Playgroud)

要使用它,请运行脚本,并在另一个终端telnet中运行到localhost,端口7557.我执行的示例运行的输出,排队三个请求,但在处理第三个请求期间关闭连接:

Connection from 127.0.0.1
Starting request 1
Queueing request 2
Queueing request 3
Request finished.
Starting request 2
Request finished.
Starting request 3
Request cancelled.
Connection closed.
Run Code Online (Sandbox Code Playgroud)

epoll替代品

另一个编辑:我用另一个例子select.epoll来监视事件.我不认为它提供了超过原始示例,因为当远程端挂起时我无法看到接收事件的方法.您仍然需要监视收到的数据事件并检查零长度消息(同样,我希望在此声明中证明是错误的).

import select
import socket
import time

port = 7557

# Create the server.
serversocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
serversocket.bind((socket.gethostname(), port))
serversocket.listen(1)
serverfd = serversocket.fileno()
print "Listening on", socket.gethostname(), "port", port

# Make the socket non-blocking.
serversocket.setblocking(0)

# Initialise the list of clients.
clients = {}

# Create an epoll object and register our interest in read events on the server
# socket.
ep = select.epoll()
ep.register(serverfd, select.EPOLLIN)

while True:
    # Check for events.
    events = ep.poll(0)
    for fd, event in events:
        # New connection to server.
        if fd == serverfd and event & select.EPOLLIN:
            # Accept the connection.
            connection, address = serversocket.accept()
            connection.setblocking(0)

            # We want input notifications.
            ep.register(connection.fileno(), select.EPOLLIN)

            # Store some information about this client.
            clients[connection.fileno()] = {
                'delay': 0.0,
                'input': "",
                'response': "",
                'connection': connection,
                'address': address,
            }

            # Done.
            print "Accepted connection from", address

        # A socket was closed on our end.
        elif event & select.EPOLLHUP:
            print "Closed connection to", clients[fd]['address']
            ep.unregister(fd)
            del clients[fd]

        # Error on a connection.
        elif event & select.EPOLLERR:
            print "Error on connection to", clients[fd]['address']
            ep.modify(fd, 0)
            clients[fd]['connection'].shutdown(socket.SHUT_RDWR)

        # Incoming data.
        elif event & select.EPOLLIN:
            print "Incoming data from", clients[fd]['address']
            data = clients[fd]['connection'].recv(1024)

            # Zero length = remote closure.
            if not data:
                print "Remote close on ", clients[fd]['address']
                ep.modify(fd, 0)
                clients[fd]['connection'].shutdown(socket.SHUT_RDWR)

            # Store the input.
            else:
                print data
                clients[fd]['input'] += data

        # Run when the client is ready to accept some output. The processing
        # loop registers for this event when the response is complete.
        elif event & select.EPOLLOUT:
            print "Sending output to", clients[fd]['address']

            # Write as much as we can.
            written = clients[fd]['connection'].send(clients[fd]['response'])

            # Delete what we have already written from the complete response.
            clients[fd]['response'] = clients[fd]['response'][written:]

            # When all the the response is written, shut the connection.
            if not clients[fd]['response']:
                ep.modify(fd, 0)
                clients[fd]['connection'].shutdown(socket.SHUT_RDWR)

    # Processing loop.
    for client in clients.keys():
        clients[client]['delay'] += 0.1

        # When the 'processing' has finished.
        if clients[client]['delay'] >= 15.0:
            # Reverse the input to form the response.
            clients[client]['response'] = clients[client]['input'][::-1]

            # Register for the ready-to-send event. The network loop uses this
            # as the signal to send the response.
            ep.modify(client, select.EPOLLOUT)

        # Processing delay.
        time.sleep(0.1)
Run Code Online (Sandbox Code Playgroud)

注意:这仅检测正确的停机.如果远程端只是在没有发送正确消息的情况下停止监听,那么在您尝试编写并出现错误之前,您将无法知道.检查这是留给读者的练习.此外,您可能希望对整个循环执行一些错误检查,以便在内部出现问题时服务器本身正常关闭.


asc*_*99c 17

我有一个反复出现的问题,与具有单独TCP链接的设备进行通信以进行发送和接收.基本问题是TCP堆栈通常不会告诉您当您尝试读取时套接字已关闭 - 您必须尝试写入以告知链接的另一端已被删除.部分地,这就是TCP的设计方式(阅读是被动的).

我猜Blair的答案适用于套接字在另一端很好地关闭的情况(即它们发送了正确的断开消息),但是在另一端不礼貌地停止收听的情况下却没有.

在消息开始时是否有一个相当固定格式的标题,您可以在整个响应准备好之前开始发送?例如XML文档类型?你也可以在消息中的某些点发送一些额外的空间 - 只需要输出一些空数据以确保套接字仍然打开?


nin*_*alj 12

套接字KEEPALIVE选项允许检测这种"丢弃连接而不告诉另一端"的情况.

您应该在SOL_SOCKET级别设置SO_KEEPALIVE选项.在Linux中,您可以使用TCP_KEEPIDLE(发送keepalive探测之前的秒数),TCP_KEEPCNT(声明另一端死亡之前的keepalive探测失败)和TCP_KEEPINTVL(keepalive探测之间的间隔秒数)修改每个套接字的超时.

在Python中:

import socket
...
s.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)
s.setsockopt(socket.SOL_TCP, socket.TCP_KEEPIDLE, 1)
s.setsockopt(socket.SOL_TCP, socket.TCP_KEEPINTVL, 1)
s.setsockopt(socket.SOL_TCP, socket.TCP_KEEPCNT, 5)
Run Code Online (Sandbox Code Playgroud)

netstat -tanop 将显示套接字处于keepalive模式:

tcp        0      0 127.0.0.1:6666          127.0.0.1:43746         ESTABLISHED 15242/python2.6     keepalive (0.76/0/0)
Run Code Online (Sandbox Code Playgroud)

同时tcpdump将显示keepalive探针:

01:07:08.143052 IP localhost.6666 > localhost.43746: . ack 1 win 2048 <nop,nop,timestamp 848683438 848683188>
01:07:08.143084 IP localhost.43746 > localhost.6666: . ack 1 win 2050 <nop,nop,timestamp 848683438 848682438>
01:07:09.143050 IP localhost.6666 > localhost.43746: . ack 1 win 2048 <nop,nop,timestamp 848683688 848683438>
01:07:09.143083 IP localhost.43746 > localhost.6666: . ack 1 win 2050 <nop,nop,timestamp 848683688 848682438>
Run Code Online (Sandbox Code Playgroud)