本文主要是介绍webmagic 爬取https的网站抛avax.net.ssl.SSLHandshakeException异常,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
webmagic 抓取带有https的网站,抛出的异常javax.net.ssl.SSLHandshakeException。
初步解决办法:
1,在自己的项目中新建httpclient文件夹,新建类HttpClientGenerator, 复制webmagic源码中的 HttpClientGenerator.
2.修改 HttpClientGenerator 的代码,需要修改 buildSSLConnectionSocketFactory 这个方法。
private SSLConnectionSocketFactory buildSSLConnectionSocketFactory() {try {return new SSLConnectionSocketFactory(createIgnoreVerifySSL(), new String[]{"SSLv2Hello","SSLv3", "TLSv1", "TLSv1.1", "TLSv1.2"},null,new DefaultHostnameVerifier()); // 优先绕过安全证书} catch (KeyManagementException e) {logger.error("ssl connection fail", e);} catch (NoSuchAlgorithmException e) {logger.error("ssl connection fail", e);}return SSLConnectionSocketFactory.getSocketFactory();}
3,修改 HttpClientDownloader 中引用的 HttpClientGenerator 为你修改后的类。
4.设置爬虫 Spider 的 Downloader 为 你修改的 HttpClientDownloader。
做以上修改之后如果问题依然没解决,报错:
SSLException: Certificate for *** doesn‘t match any of the subject alternative
此错误是说明校验证书和域名失败,绕过就可以了。
正常情况下SSL连接会验证码所有证书信息
.register(“https”, new SSLConnectionSocketFactory(sslcontext)).build();
修改HttpClientGenerator的构造方法跳过验证,注释掉的代码为源码:
public HttpClientGenerator() {
// Registry<ConnectionSocketFactory> reg = RegistryBuilder.<ConnectionSocketFactory>create()
// .register("http", PlainConnectionSocketFactory.INSTANCE)
// .register("https", buildSSLConnectionSocketFactory())
// .build();
// SSLContext sslcontext = sslContext(keyStorePath, keyStorePassword);SSLContext sslcontext = null;try {sslcontext = createIgnoreVerifySSL();} catch (NoSuchAlgorithmException e) {throw new RuntimeException(e);} catch (KeyManagementException e) {throw new RuntimeException(e);}Registry<ConnectionSocketFactory> reg = RegistryBuilder.<ConnectionSocketFactory>create().register("http", PlainConnectionSocketFactory.INSTANCE)// 只忽略域名验证码.register("https", new SSLConnectionSocketFactory(sslcontext, NoopHostnameVerifier.INSTANCE)).build();connectionManager = new PoolingHttpClientConnectionManager(reg);connectionManager.setDefaultMaxPerRoute(100);}
这篇关于webmagic 爬取https的网站抛avax.net.ssl.SSLHandshakeException异常的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!