如何知道HTTP标头部分何时结束?

Jac*_*ack 4 c sockets http-headers

服务器返回HTTP头和二进制文件; 这样的事情:

HTTP/1.1 200 OK
Date: Thu, 28 Jun 2012 22:11:14 GMT
Server: Apache/2.2.3 (Red Hat)
Set-Cookie: JSESSIONID=blabla; Path=/
Pragma: no-cache
Cache-Control: must-revalidate, no-store
Expires: Thu, 01 Jan 1970 00:00:00 GMT
Content-disposition: inline; filename="foo.pdf"
Content-Length: 6231119
Connection: close
Content-Type: application/pdf

%PDF-1.6
%âãÏÓ
5989 0 obj
<</Linearized 1/L 6231119/O 5992/E 371504/N 1498/T 6111290/H [ 55176 6052]>>
endobj

xref
5989 2744
0000000016 00000 n
0000061228 00000 n
0000061378 00000 n
Run Code Online (Sandbox Code Playgroud)

我只想复制二进制文件.但是如何知道标题部分何时结束?我试过检查行是否包含\r\n\r\n但看起来这个标准不适用于服务器响应,只适用于客户端.这给出了:

Content-disposition: inline; filename="foo.pdf"
Content-Length: 6231119
Connection: close
Content-Type: application/pdf

%PDF-1.6
%âãÏÓ
5989 0 obj
<</Linearized 1/L 6231119/O 5992/E 371504/N 1498/T 6111290/H [ 55176 6052]>>
endobj

xref
5989 2744
0000000016 00000 n
Run Code Online (Sandbox Code Playgroud)

这是C代码:

while((readed = recv(sock, buffer, 128, 0)) > 0) {

    if(isnheader == 0 && strstr(buffer, "\r\n\r\n") != NULL)
        isnheader = 1;

        if(isnheader) 
          fwrite(buffer, 1, readed, fp);
}
Run Code Online (Sandbox Code Playgroud)

更新:

我把continue控件放到我的if语句中:

if(isnheader == 0 && strstr(buffer, "\r\n\r\n") != NULL) {
    isnheader = 1;
    continue;
}
Run Code Online (Sandbox Code Playgroud)

好吧,它按预期工作.但正如@Alnitak提到的那样,它并不安全.

Aln*_*tak 17

标题和正文应该分开\r\n\r\n(RFC 2616的第4.1节)

但是,某些服务器可能会省略\r并且只发送\n行,特别是如果它们无法清理任何CGI提供的标头以确保它们包含\r.

您还需要考虑如何对读取进行分块 - 分隔符完全可能跨越您的128字节块,这将阻止strstr呼叫工作.


Dav*_*dek 2

您没有正确解析您的输入。以下是您做错的几件事:

  • 您的代码似乎暗示您的缓冲区最多包含一行标头数据。然而,recv() 并不对数据“行”进行操作,而是对二进制数据块进行操作。因此,如果你告诉它你的缓冲区长度是 128 字节,它会尝试用 128 字节数据填充你的缓冲区(如果可用)(即使 128 字节数据包含多个“行”)。
  • 您的代码没有考虑到标头中断的“\r\n”可能会通过两次不同的recv()调用拉入缓冲区,这会阻止您的代码识别标头中断。
  • 如果您确实发现标头中断(如果标头大小恰到好处,则可能会发生这种情况),您最终将推送带有终止“\r\n”和标头中断(“\r\n”)的最后一个标头到您的二进制数据副本中。

我编写了一个快速函数,它应该找到 HTTP 标头的末尾并将服务器响应的其余部分写入文件流:

void parse_http_headers(int s, FILE * fp)
{
   int       isnheader;
   ssize_t   readed;
   size_t    len;
   size_t    offset;
   size_t    pos;
   char      buffer[1024];
   char    * eol; // end of line
   char    * bol; // beginning of line

   isnheader = 0;
   len       = 0;

   // read next chunk from socket
   while((readed = read(s, &buffer[len], (1023-len))) > 0)
   {
      // write rest of data to FILE stream
      if (isnheader != 0)
         fwrite(buffer, 1, readed, fp);

      // process headers
      if (isnheader == 0)
      {
         // calculate combined length of unprocessed data and new data
         len += readed;

         // NULL terminate buffer for string functions
         buffer[len] = '\0';

         // checks if the header break happened to be the first line of the
         // buffer
         if (!(strncmp(buffer, "\r\n", 2)))
         {
            if (len > 2)
               fwrite(buffer, 1, (len-2), fp);
            continue;
         };
         if (!(strncmp(buffer, "\n", 1)))
         {
            if (len > 1)
               fwrite(buffer, 1, (len-1), fp);
            continue;
         };

         // process each line in buffer looking for header break
         bol = buffer;
         while((eol = index(bol, '\n')) != NULL)
         {
            // update bol based upon the value of eol
            bol = eol + 1; 

            // test if end of headers has been reached
            if ( (!(strncmp(bol, "\r\n", 2))) || (!(strncmp(bol, "\n", 1))) )
            {
               // note that end of headers has been reached
               isnheader = 1;

               // update the value of bol to reflect the beginning of the line
               // immediately after the headers
               if (bol[0] != '\n')
                  bol += 1;
               bol += 1;

               // calculate the amount of data remaining in the buffer
               len = len - (bol - buffer);

               // write remaining data to FILE stream
               if (len > 0)
                  fwrite(bol, 1, len, fp);

               // reset length of left over data to zero and continue processing
               // non-header information
               len = 0;
            };
         };

         if (isnheader == 0)
         { 
            // shift data remaining in buffer to beginning of buffer
            offset = (bol - buffer);
            for(pos = 0; pos < offset; pos++)
               buffer[pos] = buffer[offset + pos];

            // save amount of unprocessed data remaining in buffer
            len = offset;
         };
      };
   };

   return;
}
Run Code Online (Sandbox Code Playgroud)

我还没有测试过代码,所以它可能有简单的错误,但是它应该为您指明从 C 缓冲区中解析字符串数据的正确方向。