Python套接字缓冲

Bas*_*ard 21 python sockets buffering

假设我想使用标准socket模块从套接字读取一行:

def read_line(s):
    ret = ''

    while True:
        c = s.recv(1)

        if c == '\n' or c == '':
            break
        else:
            ret += c

    return ret
Run Code Online (Sandbox Code Playgroud)

究竟发生了s.recv(1)什么?它每次都会发出系统调用吗?我想我应该添加一些缓冲,无论如何:

为了最好地匹配硬件和网络现实,bufsize的值应该是2的相对较小的幂,例如4096.

http://docs.python.org/library/socket.html#socket.socket.recv

但是编写高效且线程安全的缓冲似乎并不容易.如果我使用file.readline()怎么办?

# does this work well, is it efficiently buffered?
s.makefile().readline()
Run Code Online (Sandbox Code Playgroud)

Aar*_*ers 27

如果你所关心的性能和完全控制插座(你是不是传递到例如库),然后尝试实现的Python自己的缓冲 - Python的string.find和string.split并且这样可以快得惊人.

def linesplit(socket):
    buffer = socket.recv(4096)
    buffering = True
    while buffering:
        if "\n" in buffer:
            (line, buffer) = buffer.split("\n", 1)
            yield line + "\n"
        else:
            more = socket.recv(4096)
            if not more:
                buffering = False
            else:
                buffer += more
    if buffer:
        yield buffer
Run Code Online (Sandbox Code Playgroud)

如果你希望有效载荷由不太大的行组成,那么它应该运行得非常快,并且避免不必要地跳过太多层的函数调用.我很有兴趣知道它与file.readline()或使用socket.recv(1)的比较.


Joe*_*erg 20

recv()呼叫是通过调用C库函数直接处理.

它将阻止等待套接字获取数据.实际上它只会让recv()系统调用块.

file.readline()是一种有效的缓冲实现.它不是线程安全的,因为它假定它是唯一读取文件的人.(例如,通过缓冲即将到来的输入.)

如果您正在使用文件对象,则每次read()使用正参数调用时,基础代码将recv()只会请求所需的数据量,除非它已经被缓冲.

如果符合以下条件,它将被缓冲

  • 你调用了readline(),它读取了一个完整的缓冲区

  • 该行的结尾是在缓冲区结束之前

从而将数据留在缓冲区中.否则缓冲区通常不会过满.

问题的目标尚不清楚.如果您在阅读之前需要查看数据是否可用,您可以使用select()或将套接字设置为非阻塞模式s.setblocking(False).然后,如果没有等待数据,则读取将返回空,而不是阻塞.

您是在阅读多个线程的文件或套接字吗?我会让一个工作者读取套接字并将收到的项目输入队列以供其他线程处理.

建议咨询Python套接字模块源进行系统调用的C Source.


ale*_*lex 6

def buffered_readlines(pull_next_chunk, buf_size=4096):
  """
  pull_next_chunk is callable that should accept one positional argument max_len,
  i.e. socket.recv or file().read and returns string of up to max_len long or
  empty one when nothing left to read.

  >>> for line in buffered_readlines(socket.recv, 16384):
  ...   print line
    ...
  >>> # the following code won't read whole file into memory
  ... # before splitting it into lines like .readlines method
  ... # of file does. Also it won't block until FIFO-file is closed
  ...
  >>> for line in buffered_readlines(open('huge_file').read):
  ...   # process it on per-line basis
        ...
  >>>
  """
  chunks = []
  while True:
    chunk = pull_next_chunk(buf_size)
    if not chunk:
      if chunks:
        yield ''.join(chunks)
      break
    if not '\n' in chunk:
      chunks.append(chunk)
      continue
    chunk = chunk.split('\n')
    if chunks:
      yield ''.join(chunks + [chunk[0]])
    else:
      yield chunk[0]
    for line in chunk[1:-1]:
      yield line
    if chunk[-1]:
      chunks = [chunk[-1]]
    else:
      chunks = []
Run Code Online (Sandbox Code Playgroud)