jfi*_*isk 23 java https android web-scraping jsoup
它在HTTP上运行良好,但是当我尝试使用HTTPS源时,它会抛出以下异常:
10-12 13:22:11.169: WARN/System.err(332): javax.net.ssl.SSLHandshakeException: java.security.cert.CertPathValidatorException: Trust anchor for certification path not found.
10-12 13:22:11.179: WARN/System.err(332): at org.apache.harmony.xnet.provider.jsse.OpenSSLSocketImpl.startHandshake(OpenSSLSocketImpl.java:477)
10-12 13:22:11.179: WARN/System.err(332): at org.apache.harmony.xnet.provider.jsse.OpenSSLSocketImpl.startHandshake(OpenSSLSocketImpl.java:328)
10-12 13:22:11.179: WARN/System.err(332): at org.apache.harmony.luni.internal.net.www.protocol.http.HttpConnection.setupSecureSocket(HttpConnection.java:185)
10-12 13:22:11.179: WARN/System.err(332): at org.apache.harmony.luni.internal.net.www.protocol.https.HttpsURLConnectionImpl$HttpsEngine.makeSslConnection(HttpsURLConnectionImpl.java:433)
10-12 13:22:11.189: WARN/System.err(332): at org.apache.harmony.luni.internal.net.www.protocol.https.HttpsURLConnectionImpl$HttpsEngine.makeConnection(HttpsURLConnectionImpl.java:378)
10-12 13:22:11.189: WARN/System.err(332): at org.apache.harmony.luni.internal.net.www.protocol.http.HttpURLConnectionImpl.connect(HttpURLConnectionImpl.java:205)
10-12 13:22:11.189: WARN/System.err(332): at org.apache.harmony.luni.internal.net.www.protocol.https.HttpsURLConnectionImpl.connect(HttpsURLConnectionImpl.java:152)
10-12 13:22:11.189: WARN/System.err(332): at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:377)
10-12 13:22:11.189: WARN/System.err(332): at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:364)
10-12 13:22:11.189: WARN/System.err(332): at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:143)
Run Code Online (Sandbox Code Playgroud)
这是相关的代码:
try {
doc = Jsoup.connect("https url here").get();
} catch (IOException e) {
Log.e("sys","coudnt get the html");
e.printStackTrace();
}
Run Code Online (Sandbox Code Playgroud)
Bal*_*usC 55
如果您想以正确的方式进行,和/或您只需要处理一个站点,那么您基本上需要获取相关网站的SSL证书并将其导入Java密钥库.这将生成一个JKS文件,您在使用Jsoup(或java.net.URLConnection)之前将其设置为SSL信任库.
您可以从webbrowser商店获取证书.我们假设您正在使用Firefox.
现在你有了一个web2.uconn.edu.crt文件.
接下来,打开命令提示符并使用keytool命令将其导入Java密钥库(它是JRE的一部分):
keytool -import -v -file /path/to/web2.uconn.edu.crt -keystore /path/to/web2.uconn.edu.jks -storepass drowssap
Run Code Online (Sandbox Code Playgroud)
在-file必须指向的位置.crt,你刚刚下载的文件.在-keystore必须指向生成的位置.jks文件(你反过来要设置为SSL信任库).这-storepass是必需的,你可以输入你想要的任何密码,只要它至少6个字符.
现在,你有一个web2.uconn.edu.jks文件.您最终可以在连接之前将其设置为SSL信任存储,如下所示:
System.setProperty("javax.net.ssl.trustStore", "/path/to/web2.uconn.edu.jks");
Document document = Jsoup.connect("https://web2.uconn.edu/driver/old/timepoints.php?stopid=10").get();
// ...
Run Code Online (Sandbox Code Playgroud)
作为一个完全不同的替代方案,特别是当您需要处理多个站点(即您正在创建万维网爬虫)时,您还可以指示Jsoup(基本上java.net.URLConnection)盲目信任所有SSL证书.另请参阅本答案最底部的"处理不受信任或配置错误的HTTPS站点"部分:使用java.net.URLConnection触发和处理HTTP请求
小智 12
就我而言,我需要做的就是在我的连接中添加.validateTLSCertificates(false)
Document doc = Jsoup.connect(httpsURLAsString)
.timeout(60000).validateTLSCertificates(false).get();
Run Code Online (Sandbox Code Playgroud)
我还必须增加读取超时,但我认为这是无关紧要的
我偶然发现了这里和我搜索中链接问题的答案,并希望添加两条信息,因为接受的答案不适合我非常相似的情况,但是有一个额外的解决方案,即使在这种情况下也适合(cert和主机名与测试系统不匹配).
disableSSLCertCheck()我在第一个Jsoup.connect()之前调用的方法中.在使用此方法之前,您应该确定自己了解自己在那里做了什么 - 不检查SSL证书是一件非常愚蠢的事情.始终为您的服务器使用正确的SSL证书,这些证书由通常接受的CA签名.如果您无法负担普遍接受的CA,请使用正确的SSL证书,但@BalusC接受上述答案.如果您无法配置正确的SSL证书(在生产环境中绝不应该这样),则可以使用以下方法:
private void disableSSLCertCheck() throws NoSuchAlgorithmException, KeyManagementException {
// Create a trust manager that does not validate certificate chains
TrustManager[] trustAllCerts = new TrustManager[] {new X509TrustManager() {
public java.security.cert.X509Certificate[] getAcceptedIssuers() {
return null;
}
public void checkClientTrusted(X509Certificate[] certs, String authType) {
}
public void checkServerTrusted(X509Certificate[] certs, String authType) {
}
}
};
// Install the all-trusting trust manager
SSLContext sc = SSLContext.getInstance("SSL");
sc.init(null, trustAllCerts, new java.security.SecureRandom());
HttpsURLConnection.setDefaultSSLSocketFactory(sc.getSocketFactory());
// Create all-trusting host name verifier
HostnameVerifier allHostsValid = new HostnameVerifier() {
public boolean verify(String hostname, SSLSession session) {
return true;
}
};
// Install the all-trusting host verifier
HttpsURLConnection.setDefaultHostnameVerifier(allHostsValid);
}
Run Code Online (Sandbox Code Playgroud)
要抑制特定 JSoup 连接的证书警告,可以使用以下方法:
科特林
val document = Jsoup.connect("url")
.sslSocketFactory(socketFactory())
.get()
private fun socketFactory(): SSLSocketFactory {
val trustAllCerts = arrayOf<TrustManager>(object : X509TrustManager {
@Throws(CertificateException::class)
override fun checkClientTrusted(chain: Array<X509Certificate>, authType: String) {
}
@Throws(CertificateException::class)
override fun checkServerTrusted(chain: Array<X509Certificate>, authType: String) {
}
override fun getAcceptedIssuers(): Array<X509Certificate> {
return arrayOf()
}
})
try {
val sslContext = SSLContext.getInstance("TLS")
sslContext.init(null, trustAllCerts, java.security.SecureRandom())
return sslContext.socketFactory
} catch (e: Exception) {
when (e) {
is RuntimeException, is KeyManagementException -> {
throw RuntimeException("Failed to create a SSL socket factory", e)
}
else -> throw e
}
}
}
Run Code Online (Sandbox Code Playgroud)
爪哇
Document document = Jsoup.connect("url")
.sslSocketFactory(socketFactory())
.get();
private SSLSocketFactory socketFactory() {
TrustManager[] trustAllCerts = new TrustManager[]{new X509TrustManager() {
public java.security.cert.X509Certificate[] getAcceptedIssuers() {
return null;
}
public void checkClientTrusted(X509Certificate[] certs, String authType) {
}
public void checkServerTrusted(X509Certificate[] certs, String authType) {
}
}};
try {
SSLContext sslContext = SSLContext.getInstance("TLS");
sslContext.init(null, trustAllCerts, new java.security.SecureRandom());
return sslContext.getSocketFactory();
} catch (NoSuchAlgorithmException | KeyManagementException e) {
throw new RuntimeException("Failed to create a SSL socket factory", e);
}
}
Run Code Online (Sandbox Code Playgroud)
注意。如前所述,忽略证书不是一个好主意。