小编Ahm*_*Ali的帖子

用java从网页中读取源代码

我正在尝试从网页中读取源代码。我的Java代码是

import java.net.*;
import java.io.*;
import java.util.*;
import javax.swing.JOptionPane;

class Testing{
public static void Connect() throws Exception{


  URL url = new URL("http://excite.com/education");
  URLConnection spoof = url.openConnection();


  spoof.setRequestProperty( "User-Agent", "Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0; H010818)" );
  BufferedReader in = new BufferedReader(new InputStreamReader(spoof.getInputStream()));
  String strLine = "";


  while ((strLine = in.readLine()) != null){


   System.out.println(strLine);
  }

  System.out.println("End of page.");
 }

 public static void main(String[] args){

  try{

   Connect();
  }catch(Exception e){

  }
}
Run Code Online (Sandbox Code Playgroud)

当我编译并运行此代码时,它提供以下输出:

? I?%&/m?{J?J??t?$?@??????iG#)?*??eVe]f@????{???{???;?N'????\fdl??J??!?? ??~|?"~?$}?>???????4?????7N?????+??M?N???J?tZfM??G?j?? ??R??!?9??>JgE??Ge[????????W???????8?????? ?|8? ??????? ??ho????0????|?:--?|?L?U?????m?zt?n3??l\?w??O^f?G[?CG< ?y6K??gM?rg???y?E?y????h~????X???l=??Z?/????(?^O?UU6???? …

java html-content-extraction

2
推荐指数
1
解决办法
1万
查看次数

标签 统计

html-content-extraction ×1

java ×1