本文主要是介绍Python采集51job招聘信息,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
前言
本文的文字及图片来源于网络,仅供学习、交流使用,不具有任何商业用途,如有问题请及时联系我们以作处理。
很多人学习python,不知道从何学起。
很多人学习python,掌握了基本语法过后,不知道在哪里寻找案例上手。
很多已经做案例的人,却不知道如何去学习更加高深的知识。
那么针对这三类人,我给大家提供一个好的学习平台,免费领取视频教程,电子书籍,以及课程的源代码!??¤
QQ群:961562169
开发工具
- Python 3.6.5
- Pycharm
- requests
- re
- json
相关模块可用pip命令安装
网页分析
https://search.51job.com/list/010000%252c020000%252c030200%252c040000,000000,0000,00,9,99,python,2,1.html
- 1
请求网页
import requests
url = 'https://search.51job.com/list/010000%252c020000%252c030200%252c040000,000000,0000,00,9,99,python,2,1.html'
params = {'lang': 'c','postchannel': '0000','workyear': '99','cotype': '99','degreefrom': '99','jobterm': '99','companysize': '99','ord_field': '0','dibiaoid': '0','line': '','welfare': '',
}
cookies = {'''你的cookie'''
}
headers = {'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9','Host': 'search.51job.com','Referer': 'https://search.51job.com/list/190200,000000,0000,00,9,99,python,2,1.html?lang=c&postchannel=0000&workyear=99&cotype=99°reefrom=99&jobterm=99&companysize=99&ord_field=0&dibiaoid=0&line=&welfare=','User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36',
}response = requests.get(url=url, params=params, headers=headers, cookies=cookies)
response.encoding = response.apparent_encoding
print(response.text)
咱们需要的数据的在<script>
里面
<script type="text/javascript">
window.__SEARCH_RESULT__ =
'''
你想要获取的内容
'''
<div class="clear"></div>
用正则表达式匹配出来就可以了
把匹配出来的数据转化程json数据,然后根据字典的取值方式取自己想要数据即可
r = re.findall('window.__SEARCH_RESULT__ = (.*?)</script>', response.text, re.S)
string = ''.join(r)
info_dict = json.loads(string)
pprint.pprint(info_dict)
完整代码
import requests
import re
import json
for page in range(1, 11):url = 'https://search.51job.com/list/010000%252c020000%252c030200%252c040000,000000,0000,00,9,99,python,2,{}.html'.format(page)params = {'lang': 'c','postchannel': '0000','workyear': '99','cotype': '99','degreefrom': '99','jobterm': '99','companysize': '99','ord_field': '0','dibiaoid': '0','line': '','welfare': '',}cookies = {}headers = {'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9','Host': 'search.51job.com','Referer': 'https://search.51job.com/list/190200,000000,0000,00,9,99,python,2,1.html?lang=c&postchannel=0000&workyear=99&cotype=99°reefrom=99&jobterm=99&companysize=99&ord_field=0&dibiaoid=0&line=&welfare=','User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36',}response = requests.get(url=url, params=params, headers=headers, cookies=cookies)response.encoding = response.apparent_encodingr = re.findall('window.__SEARCH_RESULT__ = (.*?)</script>', response.text, re.S)string = ''.join(r)info_dict = json.loads(string)dit_py = info_dict['engine_search_result']dit = {}for i in dit_py:attribute_text = ' '.join(i['attribute_text'][1:])print(attribute_text)# dit['job_href'] = i['job_href']dit['job_name'] = i['job_name']dit['company_name'] = i['company_name']dit['money'] = i['providesalary_text']dit['workarea'] = i['workarea_text']dit['updatedate'] = i['updatedate']dit['companytype'] = i['companytype_text']dit['jobwelf'] = i['jobwelf']dit['attribute'] = attribute_textdit['companysize'] = i['companysize_text']print(dit)with open('python招聘信息.csv', mode='a', encoding='utf-8') as f:f.write('{},{},{},{},{},{},{},{}\n'.format(dit['job_name'], dit['company_name'], dit['money'], dit['workarea'], dit['companytype'], dit['jobwelf'], dit['attribute'], dit['companysize']))
实现效果
这篇关于Python采集51job招聘信息的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!