brt*_*rtb 10 java web-crawler web-content web
我只想从Java检索任何网页的源代码.到目前为止我找到了很多解决方案,但我找不到适用于以下所有链接的代码:
对我来说,主要问题是某些代码检索网页源代码,但缺少代码.例如,下面的代码不适用于第一个链接.
InputStream is = fURL.openStream(); //fURL can be one of the links above
BufferedReader buffer = null;
buffer = new BufferedReader(new InputStreamReader(is, "iso-8859-9"));
int byteRead;
while ((byteRead = buffer.read()) != -1) {
builder.append((char) byteRead);
}
buffer.close();
System.out.println(builder.toString());
Run Code Online (Sandbox Code Playgroud)
nar*_*yan 25
使用添加的请求属性尝试以下代码:
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.net.URL;
import java.net.URLConnection;
public class SocketConnection
{
public static String getURLSource(String url) throws IOException
{
URL urlObject = new URL(url);
URLConnection urlConnection = urlObject.openConnection();
urlConnection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11");
return toString(urlConnection.getInputStream());
}
private static String toString(InputStream inputStream) throws IOException
{
try (BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(inputStream, "UTF-8")))
{
String inputLine;
StringBuilder stringBuilder = new StringBuilder();
while ((inputLine = bufferedReader.readLine()) != null)
{
stringBuilder.append(inputLine);
}
return stringBuilder.toString();
}
}
}
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
96175 次 |
| 最近记录: |