【酱浦菌-爬虫项目】四种方法爬取百度首页信息

本文主要是介绍【酱浦菌-爬虫项目】四种方法爬取百度首页信息，希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

项目原理：

首先，定义了四个函数，每个函数都有不同的功能：
- func1()：发送一个GET请求到百度网站，并获取响应内容，演示如何使用`requests`库来获取网页内容。
- func2()：发送一个GET请求到百度网站，并获取响应内容。然后将响应内容保存为名为“baidu.png”的图片文件。
- func3()：使用Splash执行Lua脚本，加载百度网站并等待2秒，然后返回HTML内容。演示如何使用Splash来渲染JavaScript并获取渲染后的网页内容。
- func4()：使用Splash执行Lua脚本，加载百度网站，输入搜索关键词“SXT”，点击搜索按钮，等待2秒，然后返回HTML内容。演示如何使用Splash来模拟用户在网页上的交互操作。
每个函数的具体步骤如下：
- 构建请求URL，包含了百度网站的地址。
- 设置HTTP请求的头部信息，模拟了一个Chrome浏览器的请求。
- 发送GET请求到指定的URL，获取响应内容。
- 对于func2和func4，将响应内容保存为图片文件。
- 打印响应内容或其他信息。
最后，通过调用这四个函数，可以实现不同的操作，例如获取网页内容、下载图片等。

完整代码：

import requests#三个接口
def func1():url = 'https://www.baidu.com/'base_url = f'http://localhost:8050/render.html?url={url}&wait=1'resp = requests.get(base_url)print(resp.text)def func2():url = 'https://www.baidu.com/'base_url = f'http://localhost:8050/render.html?url={url}&wait=1'resp = requests.get(base_url)with open("img\\" + "baidu" +'.png', 'wb') as f:f.write(resp.content)print(resp.text)def func3():url = 'https://www.baidu.com/'lua = f'''function main(splash, args)splash:go(""{url})splash:wait(2)return splash:html()'''base_url = f'http://localhost:8050/execute?lua_source={lua}'resp = requests.get(base_url)with open("img\\" + "baidu" +'.png', 'wb') as f:f.write(resp.content)print(resp.text)def func4():url = 'https://www.baidu.com/'lua = f'''function main(splash, args)splash:go("{url}")input = splash:select("#kw")input:send_text("SXT")button = splash:select('#su)button:mouse_click()splash:wait(2)return splash:html()'''base_url = f'http://localhost:8050/execute?lua_source={lua}'resp = requests.get(base_url)with open("img\\" + "baidu" +'.png', 'wb') as f:f.write(resp.content)print(resp.text)if __name__ == '__main__':func1()func2()func3()func4()

这篇关于【酱浦菌-爬虫项目】四种方法爬取百度首页信息的文章就介绍到这儿，希望我们推荐的文章对编程师们有所帮助！