我正在尝试使用以下代码获取 HTML 页面的内容:
String malSearch = "http://myanimelist.net/anime.php?letter=" + firstLetter;
URL url = new URL(malSearch);
URLConnection con = url.openConnection();
InputStream in = con.getInputStream();
String encoding = con.getContentEncoding();
encoding = encoding == null ? "UTF-8" : encoding;
ByteArrayOutputStream baos = new ByteArrayOutputStream();
byte[] buf = new byte[8192];
int len = 0;
while ((len = in.read(buf)) != -1) {
baos.write(buf, 0, len);
}
String body = new String(baos.toByteArray(), encoding);
Run Code Online (Sandbox Code Playgroud)
它工作正常,但它没有给我我真正想要的东西。它给了我这个:
<html>
<head>
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
<meta name="format-detection" content="telephone=no">
<meta name="viewport" content="initial-scale=1.0">
<meta …Run Code Online (Sandbox Code Playgroud)