小编C. *_*ard的帖子

我在Jsoup中得到一个SocketTimeoutException:读取超时


当我尝试使用Jsoup解析大量HTML文档时,我得到一个SocketTimeoutException.
例如,我有一个链接列表:

<a href="www.domain.com/url1.html">link1</a>
<a href="www.domain.com/url2.html">link2</a>
<a href="www.domain.com/url3.html">link3</a>
<a href="www.domain.com/url4.html">link4</a>
Run Code Online (Sandbox Code Playgroud)

对于每个链接,我解析链接到URL的文档(来自href属性)以获取这些页面中的其他信息.
所以我可以想象它需要很多时间,但是如何关闭这个例外呢?
这是整个堆栈跟踪:

java.net.SocketTimeoutException: Read timed out
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.read(Unknown Source)
    at java.io.BufferedInputStream.fill(Unknown Source)
    at java.io.BufferedInputStream.read1(Unknown Source)
    at java.io.BufferedInputStream.read(Unknown Source)
    at sun.net.www.http.HttpClient.parseHTTPHeader(Unknown Source)
    at sun.net.www.http.HttpClient.parseHTTP(Unknown Source)
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
    at java.net.HttpURLConnection.getResponseCode(Unknown Source)
    at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:381)
    at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:364)
    at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:143)
    at org.jsoup.helper.HttpConnection.get(HttpConnection.java:132)
    at app.ForumCrawler.crawl(ForumCrawler.java:50)
    at Main.main(Main.java:15)
Run Code Online (Sandbox Code Playgroud)

谢谢你的哥们!

编辑: 哼......抱歉,刚刚找到了解决方案:

Jsoup.connect(url).timeout(0).get();
Run Code Online (Sandbox Code Playgroud)

希望对别人有用...... :)

java jsoup

99
推荐指数
2
解决办法
6万
查看次数

标签 统计

java ×1

jsoup ×1