ily*_*oli 5 java gzipinputstream
我请求一个发送Content-Encoding:gzip标头的网页,但却被卡住了怎么读它..
我的代码:
try {
URLConnection connection = new URL("http://jquery.org").openConnection();
String html = "";
BufferedReader in = null;
connection.setReadTimeout(10000);
in = new BufferedReader(new InputStreamReader(connection.getInputStream()));
String inputLine;
while ((inputLine = in.readLine()) != null){
html+=inputLine+"\n";
}
in.close();
System.out.println(html);
System.exit(0);
} catch (IOException ex) {
Logger.getLogger(Crawler.class.getName()).log(Level.SEVERE, null, ex);
}
Run Code Online (Sandbox Code Playgroud)
输出看起来很乱.(我无法在这里粘贴它,一种符号......)
我相信这是一个压缩内容,如何解析呢?
注意:
如果我将jquery.org更改为jquery.com(不发送该标头,我的代码运行良好)
ily*_*oli 14
实际上,这是pb2q的答案,但我发布了未来读者的完整代码
try {
URLConnection connection = new URL("http://jquery.org").openConnection();
String html = "";
BufferedReader in = null;
connection.setReadTimeout(10000);
//The changed part
if (connection.getHeaderField("Content-Encoding")!=null && connection.getHeaderField("Content-Encoding").equals("gzip")){
in = new BufferedReader(new InputStreamReader(new GZIPInputStream(connection.getInputStream())));
} else {
in = new BufferedReader(new InputStreamReader(connection.getInputStream()));
}
//End
String inputLine;
while ((inputLine = in.readLine()) != null){
html+=inputLine+"\n";
}
in.close();
System.out.println(html);
System.exit(0);
} catch (IOException ex) {
Logger.getLogger(Crawler.class.getName()).log(Level.SEVERE, null, ex);
}
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
8141 次 |
| 最近记录: |