本文主要是介绍Java超市收银系统(十、爬虫),希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
引言
爬虫功能实现,要求爬取页面数据至少100条,这里以豆瓣音乐为示例编写代码豆瓣音乐标签: 民谣 (douban.com)。
功能实现
除了爬虫功能增加,代码其他内容原理和之前博客发布是一致的,只不过这里为了区分,我们重新创建数据库,名称为music,依旧是vo包中存放数据信息,也就是java可自动生成的构造函数。dao包中存放数据库功能实现函数,主要为增删改查四大基础功能。util包中存放数据库连接函数,用于java和数据库的连接。ui包中存放主函数内容,即实现各类函数调用。service包中存放爬虫相关函数,用于实现对指定页面的数据信息爬取。、
该类定义了几个列表来保存有关正在抓取的音乐记录的不同数据:
musicName
:存储音乐专辑的名称。musicURLaddress
:存储相册的 URL。musicScore
:存储专辑的评分(分数)。musicPeople
:存储对相册进行评分的人数。musicSinger
:存储歌手或艺术家的姓名。musicTime
:存储专辑的发行日期。musicType
:存储音乐的流派或类型。musicMedium
:存储专辑的介质(例如,CD、黑胶唱片)。musicSect
:存储有关相册的其他信息(可选)。musicBarcode
:存储条形码信息(可选)。这些列表用于收集抓取的数据,然后用于将数据插入数据库。
getData()
方法该方法是启动 Web 抓取过程的主要方法:
getData()
User Agent:该字符串模拟浏览器请求,使其看起来像是来自真实浏览器。这有助于避免被网站阻止。
Loop Over Pages:该方法循环 5 个页面(即 100 个项目,假设每个页面有 20 个项目)。对于每次迭代,它都会构建当前页面的 URL,并调用
getMusicInfo()
以从该页面抓取数据。睡眠 1 秒:
Thread.sleep(1000)
是添加的延迟,以防止网站被请求淹没(一种常见的反抓取措施)。将数据插入数据库: 从所有页面抓取数据后,它会调用
insertMusicInfoToDB()
将收集的数据存储在数据库中。
对应html:
点击链接,进入每首歌的详细信息页面:
getMusicInfo()
方法此方法处理从给定页面中实际抓取的数据:
文档检索:该方法用于
Jsoup
连接到 URL 并检索 HTML 文档。选择元素:然后,它会选择所有带有类
.item
的元素,这些元素代表单独的音乐记录。提取数据: 对于每张音乐唱片,它提取名称、URL、分数、评分人数以及歌手、发行日期、类型、媒体等各种其他详细信息,并将它们添加到相应的列表中。
insertMusicInfoToDB()
方法此方法将收集的数据插入到数据库中:
Looping Over Data:该方法遍历
Information
所有收集的数据(从列表中),并为每个音乐记录创建一个对象。解析数据:它尝试将分数和人数从字符串解析为适当的类型(float 和 int)。如果解析失败,它会设置默认值(0.0f 表示 score 和 0 表示 people)。
Inserting into Database:然后调用
InformationDAO.insert(info)
将数据插入数据库。插入的结果存储在 a 中,该 a 将音乐名称映射到Map
指示插入是否成功的布尔值。记录结果:每次插入后,它会记录插入是否成功。
总结
- 网页抓取:
getData()
和getMusicInfo()
方法负责从特定网页抓取数据。- 数据收集:数据收集到各种列表中。
- 数据库插入:该方法处理将
insertMusicInfoToDB()
收集的数据插入数据库,确保每条数据都得到正确解析和存储。
结果展示
完整代码
ui—Driver
package ui;import service.MusicService;import java.io.IOException;public class Driver {public static void main(String[] args) throws IOException, InterruptedException {MusicService.getData();}
}
vo—Information
package vo;public class Information {private int id;private String musicName;private String singer;private String time;private String type;private String medium;private String sect;private String barCode;private float score;private int people;private String urlAddress;public Information() {}public Information(int id, String musicName, String singer, String time, String type, String medium, String sect, String barCode, float score, int people, String urlAddress) {this.id = id;this.musicName = musicName;this.singer = singer;this.time = time;this.type = type;this.medium = medium;this.sect = sect;this.barCode = barCode;this.score = score;this.people = people;this.urlAddress = urlAddress;}public int getId() {return id;}public void setId(int id) {this.id = id;}public String getMusicName() {return musicName;}public void setMusicName(String musicName) {this.musicName = musicName;}public String getSinger() {return singer;}public void setSinger(String singer) {this.singer = singer;}public String getTime() {return time;}public void setTime(String time) {this.time = time;}public String getType() {return type;}public void setType(String type) {this.type = type;}public String getMedium() {return medium;}public void setMedium(String medium) {this.medium = medium;}public String getSect() {return sect;}public void setSect(String sect) {this.sect = sect;}public String getBarCode() {return barCode;}public void setBarCode(String barCode) {this.barCode = barCode;}public float getScore() {return score;}public void setScore(float score) {this.score = score;}public int getPeople() {return people;}public void setPeople(int people) {this.people = people;}public String getUrlAddress() {return urlAddress;}public void setUrlAddress(String urlAddress) {this.urlAddress = urlAddress;}@Overridepublic String toString() {return "Information{" +"id=" + id +", musicName='" + musicName + '\'' +", singer='" + singer + '\'' +", time='" + time + '\'' +", type='" + type + '\'' +", medium='" + medium + '\'' +", sect='" + sect + '\'' +", barCode='" + barCode + '\'' +", score=" + score +", people=" + people +", urlAddress='" + urlAddress + '\'' +'}';}public static class Info {private String singer;private String time;private String type;private double medium;}}
dao—InformationDAO
package dao;import util.DBConnection;
import vo.Information;import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.util.ArrayList;
import java.util.List;public class InformationDAO {//按歌名查询public static Information queryByName(String musicName) {Connection con = null;PreparedStatement pst = null;ResultSet rs = null;Information information = null;try {con = DBConnection.getConnection();String sql = "SELECT * FROM music_information WHERE musicName = ?";pst = con.prepareStatement(sql);pst.setString(1, musicName);rs = pst.executeQuery();if (rs.next()) {information = new Information();information.setId(rs.getInt("id"));information.setMusicName(rs.getString("musicName"));information.setSinger(rs.getString("singer"));information.setTime(rs.getString("time"));information.setType(rs.getString("type"));information.setSect(rs.getString("medium"));information.setSect(rs.getString("sect"));information.setBarCode(rs.getString("barcode"));information.setScore(rs.getFloat("score"));information.setPeople(rs.getInt("people"));information.setUrlAddress(rs.getString("URLaddress"));}} catch (SQLException e) {throw new RuntimeException(e);} finally {DBConnection.close(con, pst);}return information;}public static List<Information> queryBySinger(String singer) {List<Information> infoList = new ArrayList<>();Connection con = null;PreparedStatement pst = null;ResultSet rs = null;try {con = DBConnection.getConnection();String sql = "SELECT * FROM music_information WHERE singer = ?";pst = con.prepareStatement(sql);pst.setString(1, singer);rs = pst.executeQuery();while (rs.next()) {Information info = new Information();info.setId(rs.getInt("id"));info.setMusicName(rs.getString("musicName"));info.setSinger(rs.getString("singer"));info.setTime(rs.getString("time"));info.setType(rs.getString("type"));info.setMedium(rs.getString("medium"));info.setSect(rs.getString("sect"));info.setBarCode(rs.getString("barcode"));info.setScore(rs.getFloat("score"));info.setPeople(rs.getInt("people"));info.setUrlAddress(rs.getString("URLaddress"));infoList.add(info);}} catch (SQLException e) {e.printStackTrace();} finally {DBConnection.close(con, pst);}return infoList;}public static int getTotalPeople() {String query = "SELECT SUM(people) AS totalPeople FROM music_information";try (Connection conn = DBConnection.getConnection();PreparedStatement pst = conn.prepareStatement(query);ResultSet rs = pst.executeQuery()) {if (rs.next()) {return rs.getInt("totalPeople");}} catch (SQLException e) {e.printStackTrace();}return 0;}public static float getAverageScore(String singer) {String query = "SELECT AVG(score) AS averageScore FROM music_information WHERE singer = ? AND sect = '民谣'";float averageScore = -1; // 默认值,表示没有找到数据Connection con = null;PreparedStatement pst = null;ResultSet rs = null;try {con = DBConnection.getConnection();pst = con.prepareStatement(query);pst.setString(1, singer);rs = pst.executeQuery();if (rs.next()) {averageScore = rs.getFloat("averageScore");}} catch (SQLException e) {e.printStackTrace();} finally {// 关闭资源try {rs.close();pst.close();con.close();} catch (SQLException e) {throw new RuntimeException(e);}}return averageScore;}//query 任意条件查寻public static ArrayList<Information> query(Information information) {Connection con = null;PreparedStatement pst = null;ResultSet rs = null;ArrayList<Information> informationArrayList = new ArrayList<>();try {con = DBConnection.getConnection();StringBuilder sql = new StringBuilder("SELECT * FROM music_information WHERE 1 = 1");if (information.getId() != 0) {sql.append(" AND id = ?");}if (information.getMusicName() != null) {sql.append(" AND musicName = ?");}if (information.getSinger() != null) {sql.append(" AND signer = ?");}if (information.getTime() != null) {sql.append(" AND time = ?");}if (information.getType() != null) {sql.append(" AND type = ?");}if (information.getMedium() != null) {sql.append(" AND medium = ?");}if (information.getSect() != null) {sql.append(" AND sect = ?");}if (information.getBarCode() != null) {sql.append(" AND barCode = ?");}if (information.getScore() != 0) {sql.append(" AND score = ?");}if (information.getPeople() != 0) {sql.append(" AND people = ?");}if (information.getUrlAddress() != null) {sql.append(" AND URLaddress = ?");}pst = con.prepareStatement(sql.toString());int paramIndex = 1;if (information.getId() != 0) {pst.setInt(paramIndex++, information.getId());}if (information.getMusicName() != null) {pst.setString(paramIndex++, information.getMusicName());}if (information.getSinger() != null) {pst.setString(paramIndex++, information.getSinger());}if (information.getTime() != null) {pst.setString(paramIndex++, information.getTime());}if (information.getType() != null) {pst.setString(paramIndex++, information.getType());}if (information.getMedium() != null) {pst.setString(paramIndex++, information.getMedium());}if (information.getSect() != null) {pst.setString(paramIndex++, information.getSect());}if (information.getBarCode() != null) {pst.setString(paramIndex++, information.getBarCode());}if (information.getScore() != 0) {pst.setFloat(paramIndex++, information.getScore());}if (information.getPeople() != 0) {pst.setInt(paramIndex++, information.getPeople());}if (information.getUrlAddress() != null) {pst.setString(paramIndex++, information.getUrlAddress());}rs = pst.executeQuery();while (rs.next()) {Information i = new Information();i.setId(rs.getInt("id"));i.setMusicName(rs.getString("musicName"));i.setSinger(rs.getString("singer"));i.setTime(rs.getString("time"));i.setType(rs.getString("type"));i.setMedium(rs.getString("medium"));i.setSect(rs.getString("sect"));i.setBarCode(rs.getString("barcode"));i.setScore(rs.getFloat("score"));i.setPeople(rs.getInt("people"));i.setUrlAddress(rs.getString("URLaddress"));informationArrayList.add(i);}} catch (SQLException e) {throw new RuntimeException(e);} finally {DBConnection.close(con, pst);}return informationArrayList;}//insertpublic static boolean insert(Information information) {Connection con = null;PreparedStatement pst = null;boolean success = false;try {con = DBConnection.getConnection();String sql = "INSERT INTO music_information (id,musicName,singer,time,type,medium,sect,barcode,score,people,URLaddress)" +"VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?,?,?)";pst = con.prepareStatement(sql);pst.setInt(1,information.getId());pst.setString(2, information.getMusicName());pst.setString(3, information.getSinger());pst.setString(4, information.getTime());pst.setString(5, information.getType());pst.setString(6, information.getMedium());pst.setString(7, information.getSect());pst.setString(8, information.getBarCode());pst.setFloat(9, information.getScore());pst.setInt(10, information.getPeople());pst.setString(11, information.getUrlAddress());int rowsAffected = pst.executeUpdate();if (rowsAffected > 0) {success = true;}} catch (SQLException e) {throw new RuntimeException(e);} finally {DBConnection.close(con, pst);}return success;}//update 更新商品信息public static boolean update(Information information) {Connection con = null;PreparedStatement pst = null;boolean success = false;try {con = DBConnection.getConnection();String sql = "UPDATE music_information SET singer=?, time=?, type=?, medium=?, sect=?, barcode=?, score=?, people=?, URLaddress=? WHERE musicName=?";pst = con.prepareStatement(sql);pst.setString(1, information.getSinger());pst.setString(2, information.getTime());pst.setString(3, information.getType());pst.setString(4, information.getMedium());pst.setString(5, information.getSect());pst.setString(6, information.getBarCode());pst.setFloat(7, information.getScore());pst.setInt(8, information.getPeople());pst.setString(9, information.getUrlAddress());pst.setString(10, information.getMusicName()); // musicName 作为最后一个参数System.out.println("执行 SQL: " + pst.toString()); // 添加日志以调试int rowsAffected = pst.executeUpdate();if (rowsAffected > 0) {success = true;} else {System.out.println("更新失败,没有匹配的记录被更新"); // 添加日志}} catch (SQLException e) {throw new RuntimeException(e);} finally {DBConnection.close(con, pst);}return success;}//delete 删除商品信息public static boolean delete(Information information) {Connection con = null;PreparedStatement pst = null;boolean success = false;try {con = DBConnection.getConnection();String sql = "DELETE FROM music_information WHERE musicName = ?";pst = con.prepareStatement(sql);pst.setString(1, information.getMusicName());int rowsAffected = pst.executeUpdate();if (rowsAffected > 0) {success = true;}} catch (SQLException e) {throw new RuntimeException(e);} finally {DBConnection.close(con, pst);}return success;}}
util—DBConnection
package util;import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.SQLException;public class DBConnection {private static String driverName;private static String url;private static String user;private static String password;//驱动加载,只需执行一次static{driverName = "com.mysql.cj.jdbc.Driver";try {Class.forName(driverName);} catch (ClassNotFoundException e) {throw new RuntimeException(e);}}//获取链接public static Connection getConnection(){url = "jdbc:mysql://localhost:3306/music?useUnicode=true&characterEncoding=utf-8";user = "root";password = "123456";Connection con = null;try {con = DriverManager.getConnection(url,user,password);} catch (SQLException e) {throw new RuntimeException(e);}return con;}//关闭资源public static void close(Connection con, PreparedStatement pst){if(con!=null) {try {con.close();} catch (SQLException e) {throw new RuntimeException(e);}}if(pst!=null) {try {pst.close();} catch (SQLException e) {throw new RuntimeException(e);}}}
}
service—MusicService
package service;import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import dao.InformationDAO;
import vo.Information;import java.io.IOException;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;public class MusicService {private static List<String> musicName = new ArrayList<>();private static List<String> musicURLaddress = new ArrayList<>();private static List<String> musicScore = new ArrayList<>();private static List<String> musicPeople = new ArrayList<>();private static List<String> musicSinger = new ArrayList<>();private static List<String> musicTime = new ArrayList<>();private static List<String> musicType = new ArrayList<>();private static List<String> musicMedium = new ArrayList<>();private static List<String> musicSect = new ArrayList<>();private static List<String> musicBarcode = new ArrayList<>();public static void getData() throws IOException, InterruptedException {String userAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36";for (int i = 0; i < 5; i++) { // 爬取共10页,每页20条数据String pageUrl = "https://music.douban.com/tag/民谣?start=" + (i * 20) + "&type=T";System.out.println("开始爬取第" + (i + 1) + "页,地址是:" + pageUrl);getMusicInfo(pageUrl, userAgent);Thread.sleep(1000); // 等待1秒(防止反爬)}// 插入数据库insertMusicInfoToDB();}public static void getMusicInfo(String url, String userAgent) throws IOException {Document document = Jsoup.connect(url).userAgent(userAgent).get();//获取<trElements musicElements = document.select(".item");for (Element music : musicElements) {// 专辑名称String name = music.select(".pl2 a").text().replace("\n", "").replace(" ", " ").trim();musicName.add(name);// 专辑链接String URLaddress = music.select(".pl2 a").attr("href");musicURLaddress.add(URLaddress);// 音乐评分String score;try {score = music.select(".rating_nums").text();} catch (Exception e) {score = "";}musicScore.add(score);//评分人数String people = music.select(".pl").get(1).text().replace(" ", "").replace("人评价", "").replace("(", "").replace(")", ""); // 评分人数musicPeople.add(people);String[] musicInfos = music.select(".pl").get(0).text().trim().split(" / ");if (musicInfos.length >= 4) {musicSinger.add(musicInfos[0]);musicTime.add(musicInfos[1]);musicType.add(musicInfos[2]);musicMedium.add(musicInfos[3]);musicSect.add(musicInfos.length > 4 ? musicInfos[4] : "");musicBarcode.add(musicInfos.length > 5 ? musicInfos[5] : "");} else {// 处理信息不完整的情况musicSinger.add(musicInfos[0]);musicTime.add(musicInfos.length > 1 ? musicInfos[1] : "");musicType.add(musicInfos.length > 2 ? musicInfos[2] : "");musicMedium.add(musicInfos.length > 3 ? musicInfos[3] : "");musicSect.add("");musicBarcode.add("");}}}public static Map<String, Object> insertMusicInfoToDB() {Map<String, Object> resultMap = new HashMap<>();for (int i = 0; i < musicName.size(); i++) {Information info = new Information();info.setMusicName(musicName.get(i));info.setSinger(musicSinger.get(i));info.setTime(musicTime.get(i));info.setType(musicType.get(i));info.setMedium(musicMedium.get(i));info.setSect(musicSect.get(i));info.setBarCode(musicBarcode.get(i));try {info.setScore(Float.parseFloat(musicScore.get(i)));} catch (NumberFormatException e) {info.setScore(0.0f);}try {info.setPeople(Integer.parseInt(musicPeople.get(i)));} catch (NumberFormatException e) {info.setPeople(0);}info.setUrlAddress(musicURLaddress.get(i));boolean success = InformationDAO.insert(info);resultMap.put(musicName.get(i), success); // 将结果添加到Map中if (success) {System.out.println("成功插入: " + info.getMusicName());} else {System.out.println("插入失败: " + info.getMusicName());}}return resultMap;}
}
mysql
create database music;
use music;CREATE TABLE `music_information` ( `id` INT , `musicName` VARCHAR(255) PRIMARY KEY, `singer` VARCHAR(255), `time` varchar(50), # 发行日期`type` VARCHAR(255), # 专辑类型`medium` VARCHAR(100),`sect` varchar(50), # 流派`barcode` VARCHAR(50), `score` DECIMAL(3, 1), `people` INT, `URLaddress` VARCHAR(500)
);INSERT INTO `music_information` (`id`,`musicName`,`singer`,`time`,`type`,`medium`,`sect`,`barcode`,`score`,`people`,`URLaddress`)VALUES
('1','Song Title 1', 'Artist Name 1', '2023-01-01', 'Album Type 1','md1', '民谣', '123456789012', 4.5, 1000, 'https://example.com/song1'),
('3','st1', 'Artist Name 1', '2023-01-01', 'Album Type 1','md1', 'Pop', '123456789012', 4.5, 1000, 'https://example.com/song1'),
('4','st2', 'Artist Name 1', '2023-01-01', 'Album Type 1','md1', '民谣', '123456789012', 4.2, 1000, 'https://example.com/song1'),
('2','Song Title 2', 'Artist Name 2', '2022-05-15', 'Album Type 2','md2', 'Rock', '234567890123', 4.2, 500, 'https://example.com/song2');drop table music_information;
select*from music_information;
这篇关于Java超市收银系统(十、爬虫)的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!