http gzipped响应的精确副本到字符串中

Sig*_*oli 2 string gzip d http

我需要帮助.

我试图在Windows上使用dmd v2.066.1获取内容编码为gzip的网站内容.这是我的测试网址:" http://diaboli.pl/test2.html".

我的HTTP请求是:

GET /test2.html HTTP/1.1
Host: diaboli.pl
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: pl,en-US;q=0.7,en;q=0.3
Accept-Encoding: gzip, deflate
User-Agent: My Browser
Referer: http://google.pl
DNT: 1
Run Code Online (Sandbox Code Playgroud)

服务器响应是:

HTTP/1.1 200 OK
Date: Sat, 24 Jan 2015 23:02:00 GMT
Server: Apache
Last-Modified: Sat, 24 Jan 2015 22:48:44 GMT
ETag: "5c468ad-83f-50d6db511eb00"
Accept-Ranges: bytes
Vary: Accept-Encoding,User-Agent
Content-Encoding: gzip
Content-Length: 942
Content-Type: text/html

.)?R!S????KRB:é?^»{??.ç}aO?_D????'d?$ë?k\|j\pý§?í?k???ß??}ú2ž?  ´d???M?Î????/§B???°'?u?Ná???Ór?m(?????????
§g??qýä??%p??&B?M]§Üú3ý^ý-ÎD`x!??&M?~?y?u?ë?Z@?]?ä2}??xdÄyWüm§????Äd4,d?î-?
??Bön°6{?u????U??,aF????O??m?Ë???ó¸ö31Î?EÖK???îÔ??ô?¸HÉ?b?}Dn?'?9?
Î??¶U?VI^?hË???_z??6?6?¨}{??Ä?e?Šo???¤?U´ö??*Šx??(,?AôlZ»Ú^ß??¸???M`¬PË?qí¨Ýç?7?§y?<J?Ó?ëb#P?R§b??>?z??âž7u?? `$S?ítR¶?u ????Xçf?°NH??? ?p?R­??¬w?\758GN?K)     ;?\Ý??????|ABYÍ???Y?+?y??kV??
n???jv¶?Sô9D???Ç?üK?2\?d[? <????ü?âG ?¸
?y??d?ß?e?¸?e_Â?úQ÷??,Ö?¬[N?b?????ÚcS?3??w?[???ŽC???????ç??HW?d=??Y??Ô]s?šX§_???C??I?y???????§?}í m\?Öç#<W*??h?g2S???qš?EËý üX?.S?kš2???â??5???6?\?B|f??Ú*ZŽ%?Î?@?E??TNgc?,????p-?î???p$?%ôe
???ý?8Ji??"L??ó????´?«?:???>?§?×??ö?T`=BÂ|5m?|?s)?R???é?\yru??=R??]??ýÉ?????¬pZÇ?9PC§?4 ×@? ????Lj?Á¨u?:?§Bšš?????nvO!0?}î*?a? ?h
?*7Î?$vn ?I?M¸??¶Î??b??äý"´?çK}?Y? ?XŽëM
Run Code Online (Sandbox Code Playgroud)

正如您所看到的,它是一个gzip编码的内容.服务器响应使用write()函数逐个字符地打印出cmd控制台.问题是,我无法制作响应字符串的确切副本.如果我尝试,我得到了这个结果:

HTTP/1.1 200 OK
Date: Sat, 24 Jan 2015 23:02:00 GMT
Server: Apache
Last-Modified: Sat, 24 Jan 2015 22:48:44 GMT
ETag: "5c468ad-83f-50d6db511eb00"
Accept-Ranges: bytes
Vary: Accept-Encoding,User-Agent
Content-Encoding: gzip
Content-Length: 942
Content-Type: text/html

??
Run Code Online (Sandbox Code Playgroud)

我可以确定内容的长度,它等于HTTP Content-Length标头值,但我可以看到,它与逐个原始的字符串不同.

有趣的是,我可以使用zlib uncompress()函数解压缩该坏内容字符串,并且它不会返回zlib数据错误,而是返回剪切的解压缩内容.当然,像FF或IE这样的浏览器显示完整的解压缩内容没有问题.

我正在连接到这样的服务器:

import std.stdio, import std.string, std.conv, std.socket, std.stream, std.socketstream, std.zlib;

ushort port=80; string domain="diaboli.pl"; 
string request_uri; int[] pos; string request; string buffer; string znak; string line; 
int contentlength=-1; int[] postab; string bodybuffer; string headerbuffer; int readingbody=0; 
std.zlib.UnCompress u; const(void)[] udata;

Socket sock = new TcpSocket(new InternetAddress(domain, port));
Stream ss   = new SocketStream(sock);

request="GET " ~ request_uri ~ " HTTP/1.1\r\n";
request~="Host: " ~ domain ~ "\r\n";
request~="Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\n";
request~="Accept-Language: pl,en-US;q=0.7,en;q=0.3\r\n";
request~="Accept-Encoding: gzip, deflate\r\n";
request~="User-Agent: My Browser\r\n";
request~="Referer: http://google.pl\r\n";
request~="DNT: 1\r\n";
request~="\r\n";

writeln("HTTP request:\n---");
writeln(request);
writeln("---");

ss.writeString(request);

writeln("\nAll response from the server character by character:\n---");
line="";
while (1)
{
    if (readingbody==1) readingbody=2; //the way to separate headers and the content - first part.

    znak = to!string(ss.getc());
    if (ss.eof()) break;
    line~=znak;
    //if (readingbody==2) 
    write(znak);

    if (znak=="\n")
    {
        if (strpos(line,"Content-Length: ")>-1) 
        {
            postab ~= strpos(line,"\r");
            postab ~= strpos(line,"\n");
            contentlength=to!int(substr(line,16,postab.sort[0]-16));
        }

        if (readingbody==0 && line=="\r\n") readingbody=1;
        line="";
    }

    buffer ~= znak;

    //the way to separate headers and the content - second part.
    if (readingbody==0 && line=="\r\n") readingbody=1;
    if (readingbody==2) bodybuffer ~= znak;
    else headerbuffer ~= znak;
}

sock.close();

writeln("\n---");

write("Content-Length="); writeln(contentlength); //This is the Content-Length determined from the HTTP Content-Length header.
write("bodybuffer.length="); writeln(bodybuffer.length); //This the length of the content string

writeln("\nAll response copied into the string:\n---");
writeln(buffer);

writeln("---\nOnly content:\n---");
writeln(bodybuffer);

writeln("---\nUncompressed:\n---");
u = new UnCompress(HeaderFormat.determineFromData);
udata = u.uncompress(bodybuffer);
writeln(cast(string)udata);

//These are my simple text processing functions similar to php.
int strpos(string str,string tofind,int caseinsensitive=0)
{
    int pos=-1;
    if (caseinsensitive==1)
    {
        str=toUpper(str);
        tofind=toUpper(tofind);
    }
    if (str.length>=tofind.length)
    {
        for(int i=0;i<str.length;i++)
        {
            if (i+tofind.length>str.length) break;
            if (str[i..i+tofind.length]==tofind) 
            {
                pos=i;
                break;
            }
        }
    }
    return pos;
}

string substr(string str,int pos, int offset)
{
    string substring="";
    if (str.length>0 && pos>-1 && offset>0)
    {
        substring=str[pos..pos+offset];
    }
    return substring;
}
Run Code Online (Sandbox Code Playgroud)

Vla*_*eev 5

您的代码有三个问题:

  1. 你使用Stream.getc,它进行换行转换.这将破坏二进制数据.您可以通过替换来解决此问题:

    znak = to!string(ss.getc());
    
    Run Code Online (Sandbox Code Playgroud)

    有:

    char c; ss.readBlock(&c, 1); znak = to!string(c);
    
    Run Code Online (Sandbox Code Playgroud)

    虽然最好std.stream完全避免,但古老的代码等待被替换.

  2. 您指定的HTTP版本为1.1,因此服务器会发回该conent Transfer-Encoding: chunked.您的程序无法处理此传输编码.您可以将协议版本更改为1.0.

  3. 使用std.zlib类时,必须flush在管理所有数据后调用.添加此行:

    udata ~= u.flush();
    
    Run Code Online (Sandbox Code Playgroud)

通过这些更改,您的程序对我来说很好.