使用 recv() 确定数据包大小的最佳方法是什么？

Question

使用 recv() 确定数据包大小的最佳方法是什么？

一般而言，对套接字编程和 C 非常陌生。我正在尝试编写一个基本程序来在两台机器之间发送和接收数据。我知道这recv不会一次获得所有数据——您基本上必须循环它，直到它读取了整个消息。

代替只是在两台机器上设置限制，我Message在客户端创建了一个简单的结构：

struct Message {
    size_t length;
    char contents[1024 - sizeof(size_t)];
} message; 
message.length = sizeof(struct Message);
message.contents = information_i_want_to_send;

Run Code Online (Sandbox Code Playgroud)

当它到达服务器时，我recv读入了一个缓冲区：（received = recv(ioSock, &buffer, 1024, 0)巧合的是它与我的 Message 结构的大小相同——但假设它不是......）。

然后我Message.length像这样从缓冲区中提取：

size_t messagelength;
messagelength = *((size_t *) &buffer);

Run Code Online (Sandbox Code Playgroud)

然后我循环recv到缓冲区 while received < messagelength。这有效，但我不禁觉得它真的很丑，而且感觉很糟糕。（特别是如果第一次recv调用读取小于sizeof(size_t)或机器是不同的位架构，在这种情况下 size_t 转换将不起作用..）。有一个更好的方法吗？

Answer 1

ike*_*ami 5

You have a fixed-size message, so you can use something like this:

#include <errno.h>
#include <limits.h>

// Returns the number of bytes read.
// EOF was reached if the number of bytes read is less than requested.
// On error, returns -1 and sets errno.
ssize_t recv_fixed_amount(int sockfd, char *buf, size_t size) {
   if (size > SSIZE_MAX) {
      errno = EINVAL;
      return -1;
   }

   ssize_t bytes_read = 0;
   while (size > 0) {
      ssize_t rv = recv(sockfd, buf, size, 0); 
      if (rv < 0)
         return -1;
      if (rv == 0)
         return bytes_read;

      size -= rv;
      bytes_read += rv;
      buf += rv;
   }

   return bytes_read;
}

Run Code Online (Sandbox Code Playgroud)

It would be used something like this:

typedef struct {
   uint32_t length;
   char contents[1020];
} Message;

Message message;

ssize_t bytes_read = recv_fixed_amount(sockfd, &(message.length), sizeof(message.length));
if (bytes_read == 0) {
   printf("EOF reached\n");
   exit(EXIT_SUCCESS);
}

if (bytes_read < 0) {
   perror("recv");
   exit(EXIT_FAILURE);
}

if (bytes_read != sizeof(message.length)) {
   fprintf(stderr, "recv: Premature EOF.\n");
   exit(EXIT_FAILURE);
}

bytes_read = recv_fixed_amount(sockfd, &(message.content), sizeof(message.content));
if (bytes_read < 0) {
   perror("recv");
   exit(EXIT_FAILURE);
}

if (bytes_read != msg_size) {
   fprintf(stderr, "recv: Premature EOF.\n");
   exit(EXIT_FAILURE);
}

Run Code Online (Sandbox Code Playgroud)

Notes:

size_t is not going to be the same everywhere, so I switched to a uint32_t.
I read the fields independently because the padding within the struct can vary between implementations. They would need to be sent that way as well.
The receiver is populating message.length with the information from the stream, but doesn't actually use it.
A malicious or buggy sender could provide a value for message.length that's too large and crash the receiver (or worse) if it doesn't validate it. Same goes for contents. It might not be NUL-terminated if that's expected.

But what if the length wasn't fixed? Then the sender would need to somehow communicate how much the reader needs to read. A common approach is a length prefix.

typedef struct {
   uint32_t length;
   char contents[];
} Message;

uint32_t contents_size;
ssize_t bytes_read = recv_fixed_amount(sockfd, &contents_size, sizeof(contents_size));
if (bytes_read == 0) {
   printf("EOF reached\n");
   exit(EXIT_SUCCESS);
}

if (bytes_read < 0) {
   perror("recv");
   exit(EXIT_FAILURE);
}

if (bytes_read != sizeof(contents_size)) {
   fprintf(stderr, "recv: Premature EOF.\n");
   exit(EXIT_FAILURE);
}

Message *message = malloc(sizeof(Message)+contents_size);
if (!message) {
   perror("malloc");
   exit(EXIT_FAILURE);
}

message->length = contents_size;

bytes_read = recv_fixed_amount(sockfd, &(message->contents), contents_size);
if (bytes_read < 0) {
   perror("recv");
   exit(EXIT_FAILURE);
}

if (bytes_read != contents_size) {
   fprintf(stderr, "recv: Premature EOF.\n");
   exit(EXIT_FAILURE);
}

Run Code Online (Sandbox Code Playgroud)

Notes:

message->length contains the size of message->contents instead of the size of the structure. This is far more useful.

Another approach is to use a sentinel value. This is a value that tells the reader the message is over. This is what the NUL that terminates C strings is. This is more complicated because you don't know how much to read in advance. Reading byte-by-byte is too expensive, so one normally uses a buffer.

 while (1) {
     extend_buffer_if_necessary();
     recv_into_buffer();
     while (buffer_contains_a_sentinel()) {
        // This also shifts the remainder of the buffer's contents.
        extract_contents_of_buffer_up_to_sentinel();
        process_extracted_message();      
     }
 }

Run Code Online (Sandbox Code Playgroud)

The advantage of using a sentinel value is that one doesn't need to know the length of the message in advance (so the sender can start sending it before it's fully created.)

The disadvantage is the same as for C strings: The message can't contain the sentinel value unless some form of escaping mechanism is used. Between this and the complexity of the reader, you can see why a length prefix is usually preferred over a sentinel value. :)

Finally, there's a better solution than sentinel values for large messages that you want to start sending before they are fully created: A sequence of length-prefixed chunks. One keeps reading chunks until a chunk of size 0 is encountered, signaling the end.

HTTP supports both length-prefixed messages (in the form of Content-Length: <length> header) and this approach (in the form of the Transfer-Encoding: chunked header).

是的，当使用流协议（例如 TCP，或从文件中读取一系列记录）时，如果该信息本身不在消息本身中，则所有内容都必须具有某种商定的大小和字节字节序。 (2认同)

归档时间：	6 年前
查看次数：	753 次
最近记录：	6 年前