Python爬虫使用Selenium爬取京东第一页商品的差评

本文主要是介绍Python爬虫使用Selenium爬取京东第一页商品的差评，希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

使用Python中的Selenium库爬取京东口罩第一页的差评

实验目的
Selenium库
谷歌浏览器驱动
参考代码
实验结果
写在最后

实验目的

使用Python中的Selenium库爬取京东口罩第一页的差评，以商品名称保存为txt文件

Selenium库

需要使用Selenium库，如无这个库的话请使用pip命令自行安装

pip install selenium

谷歌浏览器驱动

还需要用到谷歌浏览器驱动，请自行下载对应版本驱动后填入下方代码需要处

http://npm.taobao.org/mirrors/chromedriver/

参考代码

from selenium import  webdriver
import  time
import csv
import re
from selenium.webdriver.support.wait import WebDriverWait
goodslinks=[]
def get_goodslink():#填入自己的浏览器驱动位置wd=webdriver.Chrome("")#打开京东口罩搜索页面wd.get("https://search.jd.com/Search?keyword=口罩") time.sleep(4)#商品链接获取links=wd.find_elements_by_css_selector(".gl-item .gl-i-wrap .p-img a")for link in links:href=link.get_attribute('href')goodslinks.append(href)wd.close()def get_goodscomments(urls):#填入你的浏览器驱动位置wd=webdriver.Chrome("")for url in urls:wd.get(url)time.sleep(3)#获取商品名称goodsName=wd.find_element_by_css_selector(".itemInfo-wrap .sku-name").text #去除商品名的非法字符rightName=re.sub(r"[\/\\\:\*\?\"\<\>\|]", "_", goodsName)  # 控制鼠标从上往下滑动到底部wd.execute_script("window.scrollTo(0,document.body.scrollHeight);") time.sleep(3)#设置显式等待，点击差评按钮WebDriverWait(wd,3,0.2).until(lambda x:x.find_element_by_css_selector("#comment ul li:nth-child(7) a")).click()time.sleep(3)#获取差评goodsComment=wd.find_elements_by_xpath('//div[@class = "tab-con"]/div[@id = "comment-6"]//p')#写入文件名with open("E:\\python文件\\badcomments\\"+ rightName +'.txt','a+', encoding='utf-8-sig') as f: #badcomments为文件夹名字for comment in goodsComment:f.writelines(comment.text +'\n')wd.close()if __name__=="__main__":get_goodslink()get_goodscomments(goodslinks)

最开始让程序自行爬取商品名称创建txt文件时报错，搜索之后发现创建文件不能有非法字符，此时需要用re.sub()处理非法字符

rightName=re.sub(r"[\/\\\:\*\?\"\<\>\|]", "_", goodsName)

实验结果

在这里插入图片描述

ShiYu Liu

写在最后

博主仅为python新手，本篇博文仅供交流分享，如有不足之处欢迎各位大佬指点！

这篇关于Python爬虫使用Selenium爬取京东第一页商品的差评的文章就介绍到这儿，希望我们推荐的文章对编程师们有所帮助！

Python爬虫使用Selenium爬取京东第一页商品的差评

使用Python中的Selenium库爬取京东口罩第一页的差评

实验目的

Selenium库

谷歌浏览器驱动

参考代码

实验结果

写在最后

相关文章

Python的Darts库实现时间序列预测

Python正则表达式匹配和替换的操作指南

Python使用FastAPI实现大文件分片上传与断点续传功能

通过Docker容器部署Python环境的全流程

Python一次性将指定版本所有包上传PyPI镜像解决方案

Spring Security简介、使用与最佳实践

springboot中使用okhttp3的小结

Python实现Excel批量样式修改器(附完整代码)

python获取指定名字的程序的文件路径的两种方法

Java使用Javassist动态生成HelloWorld类

Python爬虫 使用Selenium爬取京东第一页商品的差评

使用Python中的Selenium库爬取京东口罩第一页的差评

实验目的

Selenium库

谷歌浏览器驱动

参考代码

实验结果

写在最后

相关文章

Python爬虫使用Selenium爬取京东第一页商品的差评