爬虫爬取豆瓣电影、价格、书名

本文主要是介绍爬虫爬取豆瓣电影、价格、书名，希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

1、爬取豆瓣电影top250

import requests
from bs4 import BeautifulSoupheaders = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
}for i in range(0, 250, 25):print(f"--------第{i+1}到{i+25}个电影------------")response = requests.get(f"https://movie.douban.com/top250?start={i}", headers=headers)if response.ok:html = response.textsoup = BeautifulSoup(html, "html.parser")all_titles = soup.findAll("span", attrs={"class": "title"})j = ifor title in all_titles:title_string = title.stringif "/" not in title_string:j += 1print(f"{j}、{title_string}")else:print("请求失败")

2、爬取价格

import requests
from bs4 import BeautifulSoupcontent = requests.get("http://books.toscrape.com/").text
soup = BeautifulSoup(content, "html.parser")
# 因为价格在标签为p的里面，所以写p，它的属性为class="price_color"
all_prices = soup.findAll("p", attrs={"class": "price_color"})
print(all_prices)
for price in all_prices:print(price.string[2:])

3、爬取书名

import requests
from bs4 import BeautifulSoupcontent = requests.get("http://books.toscrape.com/").text
soup = BeautifulSoup(content, "html.parser")
# 因为书名在h3中，又包了一层a，所以先找h3，再找a
all_titles = soup.findAll("h3")
for title in all_titles:all_links = title.findAll("a")for link in all_links:print(link.string)