HttpClient + Jsoup 模拟登陆,解析HTML,信息筛选(广工图书馆)

本文主要是介绍HttpClient + Jsoup 模拟登陆,解析HTML,信息筛选(广工图书馆),希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

 

 

 

 

HttpClient + Jsoup 模拟登陆,解析HTML获取信息

 

最近在做一个校园综合Android客户端,主要是想把学校各类网站信息进行整合,放在一个平台上,供学校学生阅览。

 

思路如下:

拿广东工业大学图书馆网站作为一个例子

实现目标:用个人账号登陆图书馆并获取到个人借阅情况。

登陆地址 http://222.200.98.171:81/login.aspx

这里会用到Chrome的开发者工具(浏览器按F12可以开启)



 
 

打开登陆界面的源码,下面是源码中的form标签

 

<form name="aspnetForm" method="post" action="login.aspx?ReturnUrl=%2fuser%2fuserinfo.aspx" οnsubmit="javascript:return WebForm_OnSubmit();" id="aspnetForm">
<div>
<input type="hidden" name="__EVENTTARGET" id="__EVENTTARGET" value="" />
<input type="hidden" name="__EVENTARGUMENT" id="__EVENTARGUMENT" value="" />
<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="/wEPDwULLTE0MjY3MDAxNzcPZBYCZg9kFgoCAQ8PFgIeCEltYWdlVXJsBRt+XGltYWdlc1xoZWFkZXJvcGFjNGdpZi5naWZkZAICDw8WAh4EVGV4dAUt5bm/5Lic5bel5Lia5aSn5a2m5Zu+5Lmm6aaG5Lmm55uu5qOA57Si57O757ufZGQCAw8PFgIfAQUcMjAxM+W5tDAz5pyIMDXml6UgIOaYn+acn+S6jGRkAgQPZBYEZg9kFgQCAQ8WAh4LXyFJdGVtQ291bnQCCBYSAgEPZBYCZg8VAwtzZWFyY2guYXNweAAM55uu5b2V5qOA57SiZAICD2QWAmYPFQMTcGVyaV9uYXZfY2xhc3MuYXNweAAM5YiG57G75a+86IiqZAIDD2QWAmYPFQMOYm9va19yYW5rLmFzcHgADOivu+S5puaMh+W8lWQCBA9kFgJmDxUDCXhzdGIuYXNweAAM5paw5Lmm6YCa5oqlZAIFD2QWAmYPFQMUcmVhZGVycmVjb21tZW5kLmFzcHgADOivu+iAheiNkOi0rWQCBg9kFgJmDxUDE292ZXJkdWVib29rc19mLmFzcHgADOaPkOmGkuacjeWKoWQCBw9kFgJmDxUDEnVzZXIvdXNlcmluZm8uYXNweAAP5oiR55qE5Zu+5Lmm6aaGZAIID2QWAmYPFQMbaHR0cDovL2xpYnJhcnkuZ2R1dC5lZHUuY24vAA/lm77kuabppobpppbpobVkAgkPZBYCAgEPFgIeB1Zpc2libGVoZAIDDxYCHwJmZAIBD2QWBAIDD2QWBAIBDw9kFgIeDGF1dG9jb21wbGV0ZQUDb2ZmZAIHDw8WAh8BZWRkAgUPZBYGAgEPEGRkFgFmZAIDDxBkZBYBZmQCBQ8PZBYCHwQFA29mZmQCBQ8PFgIfAQWlAUNvcHlyaWdodCAmY29weTsyMDA4LTIwMDkuIFNVTENNSVMgT1BBQyA0LjAxIG9mIFNoZW56aGVuIFVuaXZlcnNpdHkgTGlicmFyeS4gIEFsbCByaWdodHMgcmVzZXJ2ZWQuPGJyIC8+54mI5p2D5omA5pyJ77ya5rex5Zyz5aSn5a2m5Zu+5Lmm6aaGIEUtbWFpbDpzenVsaWJAc3p1LmVkdS5jbmRkZL5QuJMrEZz+0UxuTVpXZ/EaY5A4" />
</div><script type="text/javascript">
//<![CDATA[
var theForm = document.forms['aspnetForm'];
if (!theForm) {theForm = document.aspnetForm;
}
function __doPostBack(eventTarget, eventArgument) {if (!theForm.onsubmit || (theForm.onsubmit() != false)) {theForm.__EVENTTARGET.value = eventTarget;theForm.__EVENTARGUMENT.value = eventArgument;theForm.submit();}
}
//]]>
</script><script src="/WebResource.axd?d=kbLQnwjf5uNQN4GcWRC5kD1rIySOzkR3uLyKE5xUO0j4Fa2lQPZwQlk_qYaspRXtlojncSBfRJNkA00qXOMQqsKd8WY1&amp;t=634751988274393221" type="text/javascript"></script><script src="/WebResource.axd?d=nsbO6ZJty6_6fuRufFNYnRiJ-xEoD0xQr70NX6g0v64gngATPLSnyyt7jyZkELLW6THXmh92_m0Y5TyvhES_-JroQeU1&amp;t=634751988274393221" type="text/javascript"></script>
<script type="text/javascript">
//<![CDATA[
function WebForm_OnSubmit() {
if (typeof(ValidatorOnSubmit) == "function" && ValidatorOnSubmit() == false) return false;
return true;
}
//]]>
</script><div><input type="hidden" name="__EVENTVALIDATION" id="__EVENTVALIDATION" value="/wEWBQKa7ezdCwKOmK5RApX9wcYGAsP9wL8JAqW86pcIaBhXmFYzd5pGDTk/afln2TfArPw=" />
</div>
<input name="ctl00$ContentPlaceHolder1$txtlogintype" type="hidden" id="ctl00_ContentPlaceHolder1_txtlogintype" value="0" />
<div id="Login" class="clearFix"><div class="LoginTitle">登录我的图书馆</div><div class="LeftLogin"><div class="LoginDiv"><div class="loginContent"><div class="loginInfo"><span class="leftInfo">图书证号:</span><span class="rightInfo"><input name="ctl00$ContentPlaceHolder1$txtUsername_Lib" type="text" id="ctl00_ContentPlaceHolder1_txtUsername_Lib" class="txtInput" autocomplete="off" /><span id="ctl00_ContentPlaceHolder1_rfv_UserName_Lib" style="color:Red;display:none;">请输入证号</span></span></div><div class="loginInfo"><span class="leftInfo">密&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;码:</span><span class="rightInfo"><input name="ctl00$ContentPlaceHolder1$txtPas_Lib" type="password" id="ctl00_ContentPlaceHolder1_txtPas_Lib" class="txtInput" /><span id="ctl00_ContentPlaceHolder1_rfv_Password_Lib" style="color:Red;display:none;">请输入密码</span></span></div><div><span id="ctl00_ContentPlaceHolder1_lblErr_Lib"></span></div><div class="loginInfo"><input type="submit" name="ctl00$ContentPlaceHolder1$btnLogin_Lib" value="登录" οnclick="javascript:WebForm_DoPostBackWithOptions(new WebForm_PostBackOptions(&quot;ctl00$ContentPlaceHolder1$btnLogin_Lib&quot;, &quot;&quot;, true, &quot;&quot;, &quot;&quot;, false, false))" id="ctl00_ContentPlaceHolder1_btnLogin_Lib" class="btn" /><input type="button" value="清空" οnclick="rset()" class="btn"/></div></div></div></div><div class="RightDescription"><img src="images/pin.gif" />  <br/>1.  如果您使用的是公共电脑,请在使用完毕后,务必退出登录,以保安全。<br />2.  首次登录,请先<a href="changepas.aspx">修改初始密码</a>。</div></div><script type="text/javascript">
//<![CDATA[
var Page_Validators =  new Array(document.getElementById("ctl00_ContentPlaceHolder1_rfv_UserName_Lib"), document.getElementById("ctl00_ContentPlaceHolder1_rfv_Password_Lib"));
//]]>
</script><script type="text/javascript">
//<![CDATA[
var ctl00_ContentPlaceHolder1_rfv_UserName_Lib = document.all ? document.all["ctl00_ContentPlaceHolder1_rfv_UserName_Lib"] : document.getElementById("ctl00_ContentPlaceHolder1_rfv_UserName_Lib");
ctl00_ContentPlaceHolder1_rfv_UserName_Lib.controltovalidate = "ctl00_ContentPlaceHolder1_txtUsername_Lib";
ctl00_ContentPlaceHolder1_rfv_UserName_Lib.focusOnError = "t";
ctl00_ContentPlaceHolder1_rfv_UserName_Lib.errormessage = "请输入证号";
ctl00_ContentPlaceHolder1_rfv_UserName_Lib.display = "Dynamic";
ctl00_ContentPlaceHolder1_rfv_UserName_Lib.evaluationfunction = "RequiredFieldValidatorEvaluateIsValid";
ctl00_ContentPlaceHolder1_rfv_UserName_Lib.initialvalue = "";
var ctl00_ContentPlaceHolder1_rfv_Password_Lib = document.all ? document.all["ctl00_ContentPlaceHolder1_rfv_Password_Lib"] : document.getElementById("ctl00_ContentPlaceHolder1_rfv_Password_Lib");
ctl00_ContentPlaceHolder1_rfv_Password_Lib.controltovalidate = "ctl00_ContentPlaceHolder1_txtPas_Lib";
ctl00_ContentPlaceHolder1_rfv_Password_Lib.focusOnError = "t";
ctl00_ContentPlaceHolder1_rfv_Password_Lib.errormessage = "请输入密码";
ctl00_ContentPlaceHolder1_rfv_Password_Lib.display = "Dynamic";
ctl00_ContentPlaceHolder1_rfv_Password_Lib.evaluationfunction = "RequiredFieldValidatorEvaluateIsValid";
ctl00_ContentPlaceHolder1_rfv_Password_Lib.initialvalue = "";
//]]>
</script><script type="text/javascript">
//<![CDATA[var Page_ValidationActive = false;
if (typeof(ValidatorOnLoad) == "function") {ValidatorOnLoad();
}function ValidatorOnSubmit() {if (Page_ValidationActive) {return ValidatorCommonOnSubmit();}else {return true;}
}//]]>
</script>
</form>

 

里面很多代码,我们要从中提取出我们登陆所需要的表单信息,input 和 select 这些标签都是作为登陆表单内容,这里只有input标签我们就提取它就好了,代码如下:

initLoginParmas(String userName,StringpassWord)和getLoginFormData(String url)两个方法

/*** 初始化参数* * @param userName* @param passWord* @return* @throws ParseException* @throws IOException*/public static List<NameValuePair> initLoginParmas(String userName,String passWord) throws ParseException, IOException {List<NameValuePair> parmasList = new ArrayList<NameValuePair>();HashMap<String, String> parmasMap = getLoginFormData(LoginUrl);Set<String> keySet = parmasMap.keySet();for (String temp : keySet) {if (temp.contains("Username")) {parmasMap.put(temp, userName);} else if (temp.contains("txtPas")) {parmasMap.put(temp, passWord);}}Set<String> keySet2 = parmasMap.keySet();System.out.println("表单内容:");for (String temp : keySet2) {System.out.println(temp + " = " + parmasMap.get(temp));}for (String temp : keySet2) {parmasList.add(new BasicNameValuePair(temp, parmasMap.get(temp)));}// System.out.println("initParams \n" + parmasMap);return parmasList;}
/*** 获取登录表单input内容* * @param url* @return* @throws IOException* @throws ParseException*/public static HashMap<String, String> getLoginFormData(String url)throws ParseException, IOException {Document document = Jsoup.parse(getHtml(url));Elements element1 = document.getElementsByTag("form");// 找出所有form表单Element element = element1.select("[method=post]").first();// 筛选出提交方法为post的表单Elements elements = element.select("input[name]");// 把表单中带有name属性的input标签取出HashMap<String, String> parmas = new HashMap<String, String>();for (Element temp : elements) {parmas.put(temp.attr("name"), temp.attr("value"));// 把所有取出的input,取出其name,放入Map中}return parmas;}

 

最后表单结果是:

表单内容:

ctl00$ContentPlaceHolder1$txtlogintype = 0
__VIEWSTATE = /wEPDwULLTE0MjY3MDAxNzcPZBYCZg9kFgoCAQ8PFgIeCEltYWdlVXJsBRt+XGltYWdlc1xoZWFkZXJvcGFjNGdpZi5naWZkZAICDw8WAh4EVGV4dAUt5bm/5Lic5bel5Lia5aSn5a2m5Zu+5Lmm6aaG5Lmm55uu5qOA57Si57O757ufZGQCAw8PFgIfAQUcMjAxM+W5tDAz5pyIMDXml6UgIOaYn+acn+S6jGRkAgQPZBYEZg9kFgQCAQ8WAh4LXyFJdGVtQ291bnQCCBYSAgEPZBYCZg8VAwtzZWFyY2guYXNweAAM55uu5b2V5qOA57SiZAICD2QWAmYPFQMTcGVyaV9uYXZfY2xhc3MuYXNweAAM5YiG57G75a+86IiqZAIDD2QWAmYPFQMOYm9va19yYW5rLmFzcHgADOivu+S5puaMh+W8lWQCBA9kFgJmDxUDCXhzdGIuYXNweAAM5paw5Lmm6YCa5oqlZAIFD2QWAmYPFQMUcmVhZGVycmVjb21tZW5kLmFzcHgADOivu+iAheiNkOi0rWQCBg9kFgJmDxUDE292ZXJkdWVib29rc19mLmFzcHgADOaPkOmGkuacjeWKoWQCBw9kFgJmDxUDEnVzZXIvdXNlcmluZm8uYXNweAAP5oiR55qE5Zu+5Lmm6aaGZAIID2QWAmYPFQMbaHR0cDovL2xpYnJhcnkuZ2R1dC5lZHUuY24vAA/lm77kuabppobpppbpobVkAgkPZBYCAgEPFgIeB1Zpc2libGVoZAIDDxYCHwJmZAIBD2QWBAIDD2QWBAIBDw9kFgIeDGF1dG9jb21wbGV0ZQUDb2ZmZAIHDw8WAh8BZWRkAgUPZBYGAgEPEGRkFgFmZAIDDxBkZBYBZmQCBQ8PZBYCHwQFA29mZmQCBQ8PFgIfAQWlAUNvcHlyaWdodCAmY29weTsyMDA4LTIwMDkuIFNVTENNSVMgT1BBQyA0LjAxIG9mIFNoZW56aGVuIFVuaXZlcnNpdHkgTGlicmFyeS4gIEFsbCByaWdodHMgcmVzZXJ2ZWQuPGJyIC8+54mI5p2D5omA5pyJ77ya5rex5Zyz5aSn5a2m5Zu+5Lmm6aaGIEUtbWFpbDpzenVsaWJAc3p1LmVkdS5jbmRkZL5QuJMrEZz+0UxuTVpXZ/EaY5A4
ctl00$ContentPlaceHolder1$txtPas_Lib =密码不告诉你
__EVENTVALIDATION = /wEWBQKa7ezdCwKOmK5RApX9wcYGAsP9wL8JAqW86pcIaBhXmFYzd5pGDTk/afln2TfArPw=
ctl00$ContentPlaceHolder1$txtUsername_Lib = 3110006527
ctl00$ContentPlaceHolder1$btnLogin_Lib = 登录

 

 

接下来是要登陆获取权限也就是获取到Cookie

代码如下:

/*** 图书馆登陆* * @param context* @return 返回登陆后的界面Html代码* @throws ClientProtocolException* @throws IOException*/public static String login() throws ClientProtocolException, IOException {List<NameValuePair> parmasList = new ArrayList<NameValuePair>();parmasList = initLoginParmas("3110006527", "2787457");HttpPost post = new HttpPost(LoginUrl);post.getParams().setParameter(ClientPNames.HANDLE_REDIRECTS, false);// 阻止自动重定向,目的是获取第一个ResponseHeader的Cookie和Locationpost.setHeader("Content-Type","application/x-www-form-urlencoded;charset=gbk");// 设置编码为GBKpost.setEntity(new UrlEncodedFormEntity(parmasList, "GBK"));HttpResponse response = new DefaultHttpClient().execute(post);cookie = response.getFirstHeader("Set-Cookie").getValue();// 取得cookie并保存起来// System.out.println("cookie= " + cookie);location = response.getFirstHeader("Location").getValue();// 重定向地址,目的是连接到主页mainUrl = Host + location;// 构建主页地址String html = getHtml(mainUrl);return html;}

 

登陆获取Cookie时候会遇到返回状态码是302,这个时候Post方法的话,系统会自动重定向到Location地址,这时候你看到的ResponseHeader已经不是你登陆后返回的那个了,而是你访问重定向地址时候返回的ResponseHeader,而cookie是含在登陆时候返回的ResponseHeader里面所以特别要注意添加语句

post.getParams().setParameter(ClientPNames.HANDLE_REDIRECTS,false);

 

 

给Post设置参数,这样就会阻止重定向,从而可以获取Cookie和Location(为了访问主页界面)

cookie =response.getFirstHeader("Set-Cookie").getValue();

 

 

接下来需要做的是根据Location得到主页地址,用Jsoup去解析主页,分析出我的借书情况的页面地址

接下来我们访问其他网页的时候就需要用到cookie 了,所以在用post或者get方法的时候要调用addHeader()或者setHeader();把Cookie设置进去

	/*** 获取网页HTML源代码* * @param url* @return* @throws ParseException* @throws IOException*/private static String getHtml(String url) throws ParseException,IOException {// TODO Auto-generated method stubHttpGet get = new HttpGet(url);if ("" != cookie) {get.addHeader("Cookie", cookie);}HttpResponse httpResponse = new DefaultHttpClient().execute(get);HttpEntity entity = httpResponse.getEntity();return EntityUtils.toString(entity);}

 

通过Chrome浏览器分析页面源码,可以看到该标签

 <a href="bookborrowed.aspx" >当前借阅情况和续借</a>

bookborrowed.aspx  这一段就是我们需要的

 

获取代码如下:

public static void getMyBorrowedBooks() {try {Document document = Jsoup.parse(login());Elements elements1 = document.getElementsContainingOwnText("当前借阅情况和续借");// 通过text关键字找到所要的<a>标签String url = elements1.first().attr("href");borrowedBooksUrl = mainUrl.substring(0,mainUrl.lastIndexOf("/") + 1) + url;// 取值和mainUrl进行拼凑组织借阅情况地址getBookBorrowedData(getHtml(borrowedBooksUrl));} catch (IOException e) {// TODO Auto-generated catch blocke.printStackTrace();}}

 

 

       获取到借书情况的地址后,我们就去访问这个地址,获取源码。

我们所需要的事这部分的数据(只截取一部分):

 <tr>            <td width="5%">续满</td><td width="10%">2013-04-10</td><td width="35%"><a href="../bookinfo.aspx?ctrlno=571892" target="_blank">编写高质量代码 [专著]:改善Java程序的151个建议=Writing solw Java cove:151 suggestons to improve your Java program/秦小波著</a></td><td width="5%"> </td><td width="8%">中文图书</td><td width="7%">A2973844</td><td width="10%">2012-12-05</td></tr><tr>       

 

 

    通过下面代码  用Jsoup进行筛选         

/*** 获取借书情况具体数据(List<BookEntity>)* * @param src* @return List<BookEntity>*/private static List<BookEntity> getBookBorrowedData(String src) {List<BookEntity> data = new ArrayList<BookEntity>();Document document = Jsoup.parse(src);Element element = document.select("[id=borrowedcontent]").first().getElementsByTag("table").first();Elements elements2 = element.getElementsByTag("tr");for (Element temp2 : elements2) {Elements elements3 = temp2.getElementsByTag("td");BookEntity entity = new test().new BookEntity().setIsFullData(elements3.get(0).text()).setData2Return(elements3.get(1).text()).setName(elements3.get(2).text()).setData2Borrowed(elements3.get(6).text());data.add(entity);}data.remove(0);System.out.println("借书情况\n");for (BookEntity temp : data) {System.out.println(temp.getName() + "\n" + temp.getData2Borrowed()+ "\n" + temp.getData2Return() + "\n"+ temp.getIsFullData());}return data;}

 

 

    最后打印出来结果是:

借书情况编写高质量代码 [专著]:改善Java程序的151个建议=Writing solw Java cove:151 suggestons to improve your Java program/秦小波著
2012-12-05
2013-04-10
续满
疯狂Java [专著]:突破程序员基本功的16课/李刚编著
2012-12-05
2013-04-10
续满
程序员修炼之道 [专著]:从小工到专家=The pragmatic programmer:From journeyman to master:评注版/(美)Andrew Hunt,(美)David Thomas著;周爱民,蔡学镛评注
2012-11-22
2013-04-10
续满
重构:改善既有代码的设计=Refactoring:improving the design of existing code/(美)Martin Fowler著;熊节译
2012-11-22
2013-04-10
续满
Android高薪之路 [专著]:Android程序员面试宝典/李宁编著
2012-11-29
2013-04-10
续满
Android技术内幕 [专著]·系统卷=Android internals·System/杨丰盛著
2012-12-04
2013-04-10
续满
我编程, 我快乐 [专著]:程序员职业规划之道=The passionate programmer:creating a remarkable career in software development/(美) Chad Fowler著;于梦瑄译
2013-01-17
2013-04-17
续满
完整代码:
package moniLogin;import java.io.IOException;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.Iterator;
import java.util.List;
import java.util.Set;import org.apache.http.Header;
import org.apache.http.HeaderElement;
import org.apache.http.HttpEntity;
import org.apache.http.HttpResponse;
import org.apache.http.NameValuePair;
import org.apache.http.ParseException;
import org.apache.http.client.ClientProtocolException;
import org.apache.http.client.entity.UrlEncodedFormEntity;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.client.params.ClientPNames;
import org.apache.http.impl.client.DefaultHttpClient;
import org.apache.http.message.BasicNameValuePair;
import org.apache.http.util.EntityUtils;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;public class test {private static String LoginUrl = "http://222.200.98.171:81/login.aspx";private static String Host = "http://222.200.98.171:81";private static String mainUrl = "";private static String borrowedBooksUrl = "";private static String cookie = "";private static String location = "";/*** @param args*/public static void main(String[] args) {// TODO Auto-generated method stubgetMyBorrowedBooks();}public static void getMyBorrowedBooks() {try {Document document = Jsoup.parse(login());Elements elements1 = document.getElementsContainingOwnText("当前借阅情况和续借");// 通过text关键字找到所要的<a>标签String url = elements1.first().attr("href");borrowedBooksUrl = mainUrl.substring(0,mainUrl.lastIndexOf("/") + 1) + url;// 取值和mainUrl进行拼凑组织借阅情况地址getBookBorrowedData(getHtml(borrowedBooksUrl));} catch (IOException e) {// TODO Auto-generated catch blocke.printStackTrace();}}/*** 获取借书情况具体数据(List<BookEntity>)* * @param src* @return List<BookEntity>*/private static List<BookEntity> getBookBorrowedData(String src) {List<BookEntity> data = new ArrayList<BookEntity>();Document document = Jsoup.parse(src);Element element = document.select("[id=borrowedcontent]").first().getElementsByTag("table").first();Elements elements2 = element.getElementsByTag("tr");for (Element temp2 : elements2) {Elements elements3 = temp2.getElementsByTag("td");BookEntity entity = new test().new BookEntity().setIsFullData(elements3.get(0).text()).setData2Return(elements3.get(1).text()).setName(elements3.get(2).text()).setData2Borrowed(elements3.get(6).text());data.add(entity);}data.remove(0);System.out.println("借书情况\n");for (BookEntity temp : data) {System.out.println(temp.getName() + "\n" + temp.getData2Borrowed()+ "\n" + temp.getData2Return() + "\n"+ temp.getIsFullData());}return data;}/*** 图书馆登陆* * @param context* @return 返回登陆后的界面Html代码* @throws ClientProtocolException* @throws IOException*/public static String login() throws ClientProtocolException, IOException {List<NameValuePair> parmasList = new ArrayList<NameValuePair>();parmasList = initLoginParmas("3110006527", "密码不告诉你");HttpPost post = new HttpPost(LoginUrl);post.getParams().setParameter(ClientPNames.HANDLE_REDIRECTS, false);// 阻止自动重定向,目的是获取第一个ResponseHeader的Cookie和Locationpost.setHeader("Content-Type","application/x-www-form-urlencoded;charset=gbk");// 设置编码为GBKpost.setEntity(new UrlEncodedFormEntity(parmasList, "GBK"));HttpResponse response = new DefaultHttpClient().execute(post);cookie = response.getFirstHeader("Set-Cookie").getValue();// 取得cookie并保存起来// System.out.println("cookie= " + cookie);location = response.getFirstHeader("Location").getValue();// 重定向地址,目的是连接到主页mainUrl = Host + location;// 构建主页地址String html = getHtml(mainUrl);return html;}/*** 获取网页HTML源代码* * @param url* @return* @throws ParseException* @throws IOException*/private static String getHtml(String url) throws ParseException,IOException {// TODO Auto-generated method stubHttpGet get = new HttpGet(url);if ("" != cookie) {get.addHeader("Cookie", cookie);}HttpResponse httpResponse = new DefaultHttpClient().execute(get);HttpEntity entity = httpResponse.getEntity();return EntityUtils.toString(entity);}/*** 初始化参数* * @param userName* @param passWord* @return* @throws ParseException* @throws IOException*/public static List<NameValuePair> initLoginParmas(String userName,String passWord) throws ParseException, IOException {List<NameValuePair> parmasList = new ArrayList<NameValuePair>();HashMap<String, String> parmasMap = getLoginFormData(LoginUrl);Set<String> keySet = parmasMap.keySet();for (String temp : keySet) {if (temp.contains("Username")) {parmasMap.put(temp, userName);} else if (temp.contains("txtPas")) {parmasMap.put(temp, passWord);}}Set<String> keySet2 = parmasMap.keySet();System.out.println("表单内容:");for (String temp : keySet2) {System.out.println(temp + " = " + parmasMap.get(temp));}for (String temp : keySet2) {parmasList.add(new BasicNameValuePair(temp, parmasMap.get(temp)));}// System.out.println("initParams \n" + parmasMap);return parmasList;}/*** 获取登录表单input内容* * @param url* @return* @throws IOException* @throws ParseException*/public static HashMap<String, String> getLoginFormData(String url)throws ParseException, IOException {Document document = Jsoup.parse(getHtml(url));Elements element1 = document.getElementsByTag("form");// 找出所有form表单Element element = element1.select("[method=post]").first();// 筛选出提交方法为post的表单Elements elements = element.select("input[name]");// 把表单中带有name属性的input标签取出HashMap<String, String> parmas = new HashMap<String, String>();for (Element temp : elements) {parmas.put(temp.attr("name"), temp.attr("value"));// 把所有取出的input,取出其name,放入Map中}return parmas;}class BookEntity {/*** 书名* */private String name;/*** 可借数*/private String leandableNum;/*** 索引号*/private String callNumber;/*** 作者*/private String writer;/*** 出版社*/private String publisher;/*** 还书时间*/private String data2Return;/*** 借书时间*/private String data2Borrowed;/*** 是否续满*/private String isFullData;public BookEntity() {}public String getName() {return name;}public String getLeandableNum() {return leandableNum;}public String getCallNumber() {return callNumber;}public String getWriter() {return writer;}public String getPublisher() {return publisher;}public BookEntity setName(String name) {this.name = name;return this;}public BookEntity setLeandableNum(String leandableNum) {this.leandableNum = leandableNum;return this;}public BookEntity setCallNumber(String callNumber) {this.callNumber = callNumber;return this;}public BookEntity setWriter(String writer) {this.writer = writer;return this;}public BookEntity setPublisher(String publisher) {this.publisher = publisher;return this;}public String getData2Return() {return data2Return;}public String getData2Borrowed() {return data2Borrowed;}public String getIsFullData() {return isFullData;}public BookEntity setData2Return(String data2Return) {this.data2Return = data2Return;return this;}public BookEntity setData2Borrowed(String data2Borrowed) {this.data2Borrowed = data2Borrowed;return this;}public BookEntity setIsFullData(String isFullData) {this.isFullData = isFullData;return this;}}}

 

 

    关于Jsoup怎么使用这里就不详细说了,

    详细请查阅这个网站:http://www.open-open.com/jsoup/

 

 

这篇关于HttpClient + Jsoup 模拟登陆,解析HTML,信息筛选(广工图书馆)的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/734341

相关文章

部署Vue项目到服务器后404错误的原因及解决方案

《部署Vue项目到服务器后404错误的原因及解决方案》文章介绍了Vue项目部署步骤以及404错误的解决方案,部署步骤包括构建项目、上传文件、配置Web服务器、重启Nginx和访问域名,404错误通常是... 目录一、vue项目部署步骤二、404错误原因及解决方案错误场景原因分析解决方案一、Vue项目部署步骤

Python如何实现PDF隐私信息检测

《Python如何实现PDF隐私信息检测》随着越来越多的个人信息以电子形式存储和传输,确保这些信息的安全至关重要,本文将介绍如何使用Python检测PDF文件中的隐私信息,需要的可以参考下... 目录项目背景技术栈代码解析功能说明运行结php果在当今,数据隐私保护变得尤为重要。随着越来越多的个人信息以电子形

前端原生js实现拖拽排课效果实例

《前端原生js实现拖拽排课效果实例》:本文主要介绍如何实现一个简单的课程表拖拽功能,通过HTML、CSS和JavaScript的配合,我们实现了课程项的拖拽、放置和显示功能,文中通过实例代码介绍的... 目录1. 效果展示2. 效果分析2.1 关键点2.2 实现方法3. 代码实现3.1 html部分3.2

CSS弹性布局常用设置方式

《CSS弹性布局常用设置方式》文章总结了CSS布局与样式的常用属性和技巧,包括视口单位、弹性盒子布局、浮动元素、背景和边框样式、文本和阴影效果、溢出隐藏、定位以及背景渐变等,通过这些技巧,可以实现复杂... 一、单位元素vm 1vm 为视口的1%vh 视口高的1%vmin 参照长边vmax 参照长边re

CSS3中使用flex和grid实现等高元素布局的示例代码

《CSS3中使用flex和grid实现等高元素布局的示例代码》:本文主要介绍了使用CSS3中的Flexbox和Grid布局实现等高元素布局的方法,通过简单的两列实现、每行放置3列以及全部代码的展示,展示了这两种布局方式的实现细节和效果,详细内容请阅读本文,希望能对你有所帮助... 过往的实现方法是使用浮动加

css渐变色背景|<gradient示例详解

《css渐变色背景|<gradient示例详解》CSS渐变是一种从一种颜色平滑过渡到另一种颜色的效果,可以作为元素的背景,它包括线性渐变、径向渐变和锥形渐变,本文介绍css渐变色背景|<gradien... 使用渐变色作为背景可以直接将渐China编程变色用作元素的背景,可以看做是一种特殊的背景图片。(是作为背

C语言中自动与强制转换全解析

《C语言中自动与强制转换全解析》在编写C程序时,类型转换是确保数据正确性和一致性的关键环节,无论是隐式转换还是显式转换,都各有特点和应用场景,本文将详细探讨C语言中的类型转换机制,帮助您更好地理解并在... 目录类型转换的重要性自动类型转换(隐式转换)强制类型转换(显式转换)常见错误与注意事项总结与建议类型

MySQL 缓存机制与架构解析(最新推荐)

《MySQL缓存机制与架构解析(最新推荐)》本文详细介绍了MySQL的缓存机制和整体架构,包括一级缓存(InnoDBBufferPool)和二级缓存(QueryCache),文章还探讨了SQL... 目录一、mysql缓存机制概述二、MySQL整体架构三、SQL查询执行全流程四、MySQL 8.0为何移除查

在Rust中要用Struct和Enum组织数据的原因解析

《在Rust中要用Struct和Enum组织数据的原因解析》在Rust中,Struct和Enum是组织数据的核心工具,Struct用于将相关字段封装为单一实体,便于管理和扩展,Enum用于明确定义所有... 目录为什么在Rust中要用Struct和Enum组织数据?一、使用struct组织数据:将相关字段绑

使用Java实现一个解析CURL脚本小工具

《使用Java实现一个解析CURL脚本小工具》文章介绍了如何使用Java实现一个解析CURL脚本的工具,该工具可以将CURL脚本中的Header解析为KVMap结构,获取URL路径、请求类型,解析UR... 目录使用示例实现原理具体实现CurlParserUtilCurlEntityICurlHandler