本文主要是介绍五、Python爬虫学习之路---食品药监局页面爬取,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
材料:
食品药监局网页
网页的信息也是post 然后获取到当前的id即可进到对应的企业
#!/usr/bin/python3
import requests
import json#UA伪装
header = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36'
}postUrl = 'http://scxk.nmpa.gov.cn:81/xk/itownet/portalAction.do?method=getXkzsList'
data = {'on': 'true','page': '1',#第一页'pageSize': '15',#获取行数'productName': '','conditionType': '1','applyname': '','applysn': '',
}
#批量获取id
response = requests.post(url = postUrl, data = data,headers = header)
dic_obj = response.json()print('json get over!!!!')idList = []#企业的ID储存链表
for dic in dic_obj['list']:idList.append(dic['ID'])postUrl = 'http://scxk.nmpa.gov.cn:81/xk/itownet/portalAction.do?method=getXkzsById'
allDataList = []#企业的信息
#遍历id 逐个请求
for id in idList:data = {'id':id}dicComData = requests.post(url = postUrl, data = data,headers = header).json()allDataList.append(dicComData)# print(dicComData)
print('post all over')fp = open('allComDataList.json','w',encoding = 'utf-8')
#indent = True 可以排版json的内容
json.dump(allDataList,fp = fp, ensure_ascii = False,indent = True)
print('save over!!!!')
这篇关于五、Python爬虫学习之路---食品药监局页面爬取的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!