java.io.IOException: 使用jsoup解析网站时标记失效

Question

java.io.IOException: 使用jsoup解析网站时标记失效

Neo*_*Far 1 html java parsing ioexception jsoup

当尝试解析网站的 html 页面时，它会因错误而崩溃：

java.io.IOException：标记已失效。

我的部分代码：

String xml = xxxxxx;
try {
    Document document = Jsoup.connect(xml).maxBodySize(1024*1024*10)
            .timeout(0).ignoreContentType(true)
            .parser(Parser.xmlParser()).get();

    Elements elements = document.body().select("td.hotv_text:eq(0)");

    for (Element element : elements) {
        Element element1 = element.select("a[href].hotv_text").first();
        hashMap.put(element.text(), element1.attr("abs:href"));
    }
} catch (HttpStatusException ex) {
    Log.i("GyWueInetSvc", "Exception while JSoup connect:" + xml +" cause:"+ ex.getMessage());
} catch (IOException e) {
    e.printStackTrace();
    throw new RuntimeException("Socket timeout: " + e.getMessage(), e);
}

Run Code Online (Sandbox Code Playgroud)

我要解析的网站大小约为 2MB。当我调试代码时，我看到在 java 包ConstrainableInputStream.java方法中：

public void reset() throws IOException {
    super.reset();remaining = maxSize - markpos;
}

Run Code Online (Sandbox Code Playgroud)

然后返回markpos= -1然后转到异常。

我该如何解决这个问题？

Answer 1

小智 5

这对我有帮助：

GET: .execute().bufferUp().parse();
POST: .method(Connection.Method.POST).execute().bufferUp().parse();

Run Code Online (Sandbox Code Playgroud)

归档时间：	8 年，1 月前
查看次数：	2845 次
最近记录：	5 年，7 月前