推荐算法之矩阵分解实例

2024-06-23 22:08

本文主要是介绍推荐算法之矩阵分解实例,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

矩阵分解的数据利用的上篇文章的数据,协同过滤

用到的知识

python的surprise
k折交叉验证
SVD
SVDpp
NMF

算法与结果可视化

# 可以使用上面提到的各种推荐系统算法
from surprise import SVD,SVDpp,NMF
from surprise import Dataset
from surprise import  print_perf
import os
from surprise import Reader, Dataset
from surprise.model_selection import cross_validate
from pandas import DataFrame 
import numpy as np
import pandas as pd##################SVD_noBiased 
## 指定文件路径
file_path = os.path.expanduser('./python_data.txt')
## 指定文件格式\n",
reader = Reader(line_format='user item rating timestamp', sep=',')
## 从文件读取数据
data = Dataset.load_from_file(file_path, reader=reader)# 在数据集上测试一下效果
#perf = evaluate(algo, data, measures=['RMSE', 'MAE'])
# Run 5-fold cross-validation and print results.
perf1 = cross_validate(SVD(n_factors=1,biased=False),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf2 = cross_validate(SVD(n_factors=3,biased=False),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf3 = cross_validate(SVD(n_factors=5,biased=False),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf4 = cross_validate(SVD(n_factors=7,biased=False),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf5 = cross_validate(SVD(n_factors=9,biased=False),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf6 = cross_validate(SVD(n_factors=11,biased=False),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf7 = cross_validate(SVD(n_factors=13,biased=False),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf8 = cross_validate(SVD(n_factors=15,biased=False),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf9 = cross_validate(SVD(n_factors=17,biased=False),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf10= cross_validate(SVD(n_factors=19,biased=False),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf11= cross_validate(SVD(n_factors=21,biased=False),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf12= cross_validate(SVD(n_factors=23,biased=False),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf13= cross_validate(SVD(n_factors=25,biased=False),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf14= cross_validate(SVD(n_factors=27,biased=False),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf15= cross_validate(SVD(n_factors=29,biased=False),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)perf_result=[]
for i in range(1,16):perf_result.append('perf'+ str(i)) 
MAE=[]
for perf in perf_result:MAE.append(round(np.mean(DataFrame(eval(perf)).iloc[:,1]),4))RMSE=[]
for perf in perf_result:RMSE.append(round(np.mean(DataFrame(eval(perf)).iloc[:,2]),4))FIT_TIME=[]
for perf in perf_result:FIT_TIME.append(round(np.mean(DataFrame(eval(perf)).iloc[:,0]),4))TEST_TIME=[]
for perf in perf_result:TEST_TIME.append(round(np.mean(DataFrame(eval(perf)).iloc[:,3]),4))MAE = DataFrame(MAE,columns=['MAE'])
RMSE = DataFrame(RMSE,columns=['RMSE'])
FIT_TIME = DataFrame(FIT_TIME,columns=['FIT_TIME'])
TEST_TIME = DataFrame(TEST_TIME,columns=['TEST_TIME'])
Factors = DataFrame([1,3,5,7,9,11,13,15,17,19,21,23,25,27,29],columns=['Factors'])SVD_noBaised_result = pd.concat([Factors,MAE,RMSE,FIT_TIME,TEST_TIME],axis=1)
SVD_noBaised_result.to_csv('./result_data/SVD_noBaised_result.csv',header=True,encoding='utf-8')
##################SVD_biased
# Run 5-fold cross-validation and print results.
perf01 = cross_validate(SVD(n_factors=1,biased=True),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf02 = cross_validate(SVD(n_factors=3,biased=True),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf03 = cross_validate(SVD(n_factors=5,biased=True),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf04 = cross_validate(SVD(n_factors=7,biased=True),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf05 = cross_validate(SVD(n_factors=9,biased=True),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf06 = cross_validate(SVD(n_factors=11,biased=True),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf07 = cross_validate(SVD(n_factors=13,biased=True),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf08 = cross_validate(SVD(n_factors=15,biased=True),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf09 = cross_validate(SVD(n_factors=17,biased=True),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf010= cross_validate(SVD(n_factors=19,biased=True),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf011= cross_validate(SVD(n_factors=21,biased=True),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf012= cross_validate(SVD(n_factors=23,biased=True),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf013= cross_validate(SVD(n_factors=25,biased=True),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf014= cross_validate(SVD(n_factors=27,biased=True),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf015= cross_validate(SVD(n_factors=29,biased=True),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)perf_result1=[]
for i in range(1,16):perf_result1.append('perf0'+ str(i)) 
MAE=[]
for perf in perf_result1:MAE.append(round(np.mean(DataFrame(eval(perf)).iloc[:,1]),4))RMSE=[]
for perf in perf_result1:RMSE.append(round(np.mean(DataFrame(eval(perf)).iloc[:,2]),4))FIT_TIME=[]
for perf in perf_result1:FIT_TIME.append(round(np.mean(DataFrame(eval(perf)).iloc[:,0]),4))TEST_TIME=[]
for perf in perf_result1:TEST_TIME.append(round(np.mean(DataFrame(eval(perf)).iloc[:,3]),4))MAE = DataFrame(MAE,columns=['MAE'])
RMSE = DataFrame(RMSE,columns=['RMSE'])
FIT_TIME = DataFrame(FIT_TIME,columns=['FIT_TIME'])
TEST_TIME = DataFrame(TEST_TIME,columns=['TEST_TIME'])
Factors = DataFrame([1,3,5,7,9,11,13,15,17,19,21,23,25,27,29],columns=['Factors'])SVD_baised_result = pd.concat([Factors,MAE,RMSE,FIT_TIME,TEST_TIME],axis=1)
SVD_baised_result.to_csv('./result_data/SVD_baised_result.csv',header=True,encoding='utf-8')##############SVD++# Run 5-fold cross-validation and print results.
perf001 = cross_validate(SVDpp(n_factors=1),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf002 = cross_validate(SVDpp(n_factors=3),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf003 = cross_validate(SVDpp(n_factors=5),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf004 = cross_validate(SVDpp(n_factors=7),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf005 = cross_validate(SVDpp(n_factors=9),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf006 = cross_validate(SVDpp(n_factors=11),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf007 = cross_validate(SVDpp(n_factors=13),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf008 = cross_validate(SVDpp(n_factors=15),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf009 = cross_validate(SVDpp(n_factors=17),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf0010= cross_validate(SVDpp(n_factors=19),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf0011= cross_validate(SVDpp(n_factors=21),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf0012= cross_validate(SVDpp(n_factors=23),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf0013= cross_validate(SVDpp(n_factors=25),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf0014= cross_validate(SVDpp(n_factors=27),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf0015= cross_validate(SVDpp(n_factors=29),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)perf_result2=[]
for i in range(1,16):perf_result2.append('perf00'+ str(i)) 
MAE=[]
for perf in perf_result2:MAE.append(round(np.mean(DataFrame(eval(perf)).iloc[:,1]),4))RMSE=[]
for perf in perf_result2:RMSE.append(round(np.mean(DataFrame(eval(perf)).iloc[:,2]),4))FIT_TIME=[]
for perf in perf_result2:FIT_TIME.append(round(np.mean(DataFrame(eval(perf)).iloc[:,0]),4))TEST_TIME=[]
for perf in perf_result2:TEST_TIME.append(round(np.mean(DataFrame(eval(perf)).iloc[:,3]),4))MAE = DataFrame(MAE,columns=['MAE'])
RMSE = DataFrame(RMSE,columns=['RMSE'])
FIT_TIME = DataFrame(FIT_TIME,columns=['FIT_TIME'])
TEST_TIME = DataFrame(TEST_TIME,columns=['TEST_TIME'])
Factors = DataFrame([1,3,5,7,9,11,13,15,17,19,21,23,25,27,29],columns=['Factors'])SVDpp_result = pd.concat([Factors,MAE,RMSE,FIT_TIME,TEST_TIME],axis=1)
SVDpp_result.to_csv('./result_data/SVDpp_result.csv',header=True,encoding='utf-8')######## NMF# Run 5-fold cross-validation and print results.
perf0001 = cross_validate(NMF(n_factors=1,biased=False),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf0002 = cross_validate(NMF(n_factors=3,biased=False),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf0003 = cross_validate(NMF(n_factors=5,biased=False),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf0004 = cross_validate(NMF(n_factors=7,biased=False),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf0005 = cross_validate(NMF(n_factors=9,biased=False),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf0006 = cross_validate(NMF(n_factors=11,biased=False),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf0007 = cross_validate(NMF(n_factors=13,biased=False),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf0008 = cross_validate(NMF(n_factors=15,biased=False),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf0009 = cross_validate(NMF(n_factors=17,biased=False),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf00010= cross_validate(NMF(n_factors=19,biased=False),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf00011= cross_validate(NMF(n_factors=21,biased=False),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf00012= cross_validate(NMF(n_factors=23,biased=False),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf00013= cross_validate(NMF(n_factors=25,biased=False),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf00014= cross_validate(NMF(n_factors=27,biased=False),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
perf00015= cross_validate(NMF(n_factors=29,biased=False),data, measures=['RMSE', 'MAE'], cv=5, verbose=True)perf_result3=[]
for i in range(1,16):perf_result3.append('perf000'+ str(i)) 
MAE=[]
for perf in perf_result3:MAE.append(round(np.mean(DataFrame(eval(perf)).iloc[:,1]),4))RMSE=[]
for perf in perf_result3:RMSE.append(round(np.mean(DataFrame(eval(perf)).iloc[:,2]),4))FIT_TIME=[]
for perf in perf_result3:FIT_TIME.append(round(np.mean(DataFrame(eval(perf)).iloc[:,0]),4))TEST_TIME=[]
for perf in perf_result3:TEST_TIME.append(round(np.mean(DataFrame(eval(perf)).iloc[:,3]),4))MAE = DataFrame(MAE,columns=['MAE'])
RMSE = DataFrame(RMSE,columns=['RMSE'])
FIT_TIME = DataFrame(FIT_TIME,columns=['FIT_TIME'])
TEST_TIME = DataFrame(TEST_TIME,columns=['TEST_TIME'])
Factors = DataFrame([1,3,5,7,9,11,13,15,17,19,21,23,25,27,29],columns=['Factors'])NMF_result = pd.concat([Factors,MAE,RMSE,FIT_TIME,TEST_TIME],axis=1)
NMF_result.to_csv('./result_data/NMF_result.csv',header=True,encoding='utf-8')
#########################################################################################################################
####################################对SVD的可视化SVD_noBaised_result <- read.csv('SVD_noBaised_result1.csv',encoding = 'utf-8')
SVD_baised_result <- read.csv('SVD_baised_result1.csv',encoding = 'utf-8')
SVDpp_result <- read.csv('SVDpp_result1.csv',encoding = 'utf-8')
NMF_result <- read.csv('NMF_result1.csv',encoding = 'utf-8')SVD_noBiased_result <- as.data.table(SVD_noBaised_result)
SVD_biased_result <- as.data.table(SVD_baised_result)
SVDpp_result <- as.data.table(SVDpp_result)
NMF_result <- as.data.table(NMF_result)SVD_noBiased_result <- SVD_noBiased_result[,SVD_class:='SVD_noBiased']
SVD_biased_result <- SVD_biased_result[,SVD_class:='SVD_biased']
SVDpp_result <- SVDpp_result[,SVD_class:='SVD++']
NMF_result <- NMF_result[,SVD_class:='NMF']merge_SVD_result <- rbind(SVD_noBiased_result,SVD_biased_result,SVDpp_result,NMF_result)
merge_SVD_result <- merge_SVD_result[,-1]# plot resultcolour <- c('#34495e','#3498db','#2ecc71','#f1c40f','#e74c3c','#9b59b6','#1abc9c')
mycol <- define_palette(swatch = colour,gradient = c(lower=colour[1L],upper=colour[2L]))
ggthemr(mycol)p01 <- ggplot(data= merge_SVD_result, aes(x=Factors, y= MAE,group=SVD_class, shape=SVD_class, color=SVD_class)) +geom_line()+geom_point()+theme(legend.position="top",axis.text.x = element_text(angle = 50, hjust = 0.5, vjust = 0.5),text = element_text(color = "black", size = 12))p02 <- ggplot(data= merge_SVD_result, aes(x=Factors, y= RMSE,group=SVD_class, shape=SVD_class, color=SVD_class)) +geom_line()+geom_point()+theme(legend.position="top",axis.text.x = element_text(angle = 50, hjust = 0.5, vjust = 0.5),text = element_text(color = "black", size = 12))p1 <- ggplot(data= merge_SVD_result, aes(x=Factors, y= FIT_TIME,group=SVD_class, shape=SVD_class, color=SVD_class)) +geom_line()+geom_point()+theme(legend.position="top",axis.text.x = element_text(angle = 50, hjust = 0.5, vjust = 0.5),text = element_text(color = "black", size = 12))p2 <- ggplot(data= merge_SVD_result, aes(x=Factors, y= TEST_TIME,group=SVD_class, shape=SVD_class, color=SVD_class)) +geom_line()+geom_point()+theme(legend.position="top",axis.text.x = element_text(angle = 50, hjust = 0.5, vjust = 0.5),text = element_text(color = "black", size = 12))p3 <- ggplot(data= filter(merge_SVD_result,SVD_class!='SVD++'), aes(x=Factors, y= FIT_TIME,group=SVD_class, shape=SVD_class, color=SVD_class)) +geom_line()+geom_point()+theme(legend.position="none",axis.text.x = element_text(angle = 50, hjust = 0.5, vjust = 0.5),text = element_text(color = "black", size = 12))p4 <- ggplot(data= filter(merge_SVD_result,SVD_class!='SVD++'), aes(x=Factors, y= TEST_TIME,group=SVD_class, shape=SVD_class, color=SVD_class)) +geom_line()+geom_point()+theme(legend.position="none",axis.text.x = element_text(angle = 50, hjust = 0.5, vjust = 0.5),text = element_text(color = "black", size = 12))x11()
ggplot2.multiplot(p01,p02,cols = 2)
x11()
ggplot2.multiplot(p1,p2,p3,p4,cols = 2)

这里写图片描述

这里写图片描述

这篇关于推荐算法之矩阵分解实例的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/1088410

相关文章

【Linux进阶】UNIX体系结构分解——操作系统,内核,shell

1.什么是操作系统? 从严格意义上说,可将操作系统定义为一种软件,它控制计算机硬件资源,提供程序运行环境。我们通常将这种软件称为内核(kerel),因为它相对较小,而且位于环境的核心。  从广义上说,操作系统包括了内核和一些其他软件,这些软件使得计算机能够发挥作用,并使计算机具有自己的特生。这里所说的其他软件包括系统实用程序(system utility)、应用程序、shell以及公用函数库等

swiper实例

大家好,我是燐子,今天给大家带来swiper实例   微信小程序中的 swiper 组件是一种用于创建滑动视图的容器组件,常用于实现图片轮播、广告展示等效果。它通过一系列的子组件 swiper-item 来定义滑动视图的每一个页面。 基本用法   以下是一个简单的 swiper 示例代码:   WXML(页面结构) <swiper autoplay="true" interval="3

Java面试题:通过实例说明内连接、左外连接和右外连接的区别

在 SQL 中,连接(JOIN)用于在多个表之间组合行。最常用的连接类型是内连接(INNER JOIN)、左外连接(LEFT OUTER JOIN)和右外连接(RIGHT OUTER JOIN)。它们的主要区别在于它们如何处理表之间的匹配和不匹配行。下面是每种连接的详细说明和示例。 表示例 假设有两个表:Customers 和 Orders。 Customers CustomerIDCus

代码随想录算法训练营:12/60

非科班学习算法day12 | LeetCode150:逆波兰表达式 ,Leetcode239: 滑动窗口最大值  目录 介绍 一、基础概念补充: 1.c++字符串转为数字 1. std::stoi, std::stol, std::stoll, std::stoul, std::stoull(最常用) 2. std::stringstream 3. std::atoi, std

人工智能机器学习算法总结神经网络算法(前向及反向传播)

1.定义,意义和优缺点 定义: 神经网络算法是一种模仿人类大脑神经元之间连接方式的机器学习算法。通过多层神经元的组合和激活函数的非线性转换,神经网络能够学习数据的特征和模式,实现对复杂数据的建模和预测。(我们可以借助人类的神经元模型来更好的帮助我们理解该算法的本质,不过这里需要说明的是,虽然名字是神经网络,并且结构等等也是借鉴了神经网络,但其原型以及算法本质上还和生物层面的神经网络运行原理存在

大林 PID 算法

Dahlin PID算法是一种用于控制和调节系统的比例积分延迟算法。以下是一个简单的C语言实现示例: #include <stdio.h>// DALIN PID 结构体定义typedef struct {float SetPoint; // 设定点float Proportion; // 比例float Integral; // 积分float Derivative; // 微分flo

蓝牙ble数传芯片推荐,TD5327A芯片蓝牙5.1—拓达半导体

蓝牙数传芯片TD5327A芯片是一款支持蓝牙BLE的纯数传芯片,蓝牙5.1版本。芯片的亮点在于性能强,除了支持APP端直接对芯片做设置与查询操作,包括直接操作蓝牙芯片自身的IO与PWM口以外,还支持RTC日历功能,可以做各类定时类操作,极大丰富了蓝牙在IOT产品中的应用。此外,在数传应用方面,此芯片支持串口流控功能,提大提高了数据传输的稳定与可靠性。 拓达蓝牙芯片特点: 支持RTC日历功能,超

好书推荐《深度学习入门 基于Python的理论与实现》

如果你对Python有一定的了解,想对深度学习的基本概念和工作原理有一个透彻的理解,想利用Python编写出简单的深度学习程序,那么这本书绝对是最佳的入门教程,理由如下:     (1)撰写者是一名日本普通的AI工作者,主要记录了他在深度学习中的笔记,这本书站在学习者的角度考虑,秉承“解剖”深度学习的底层技术,不使用任何现有的深度学习框架、尽可能仅使用基本的数学知识和Python库。从零创建一个

如何实现一台机器上运行多个MySQL实例?

在一台机器上一个MySQL服务器运行多个MySQL实例有什么好处?这里我先入为主给大家介绍这样做至少存在两个好处(看完这篇文章后理解会更透彻): (1)减轻服务器链接负担 (2)为不同的用户提供不同的mysqld服务器的访问权限以方便这些用户进行自我管理。   下面我介绍具体的实现过程: 一、准备工作     台式机一台、Windows系统、MySQL服务器(我安装的版本是MySQL

Python分解多重列表对象,isinstance实现

“”“待打印的字符串列表:['ft','bt',['ad',['bm','dz','rc'],'mzd']]分析可知,该列表内既有字符对象,又有列表对象(Python允许列表对象不一致)现将所有字符依次打印并组成新的列表”“”a=['ft','bt',['ad',['bm','dz','rc'],'mzd']]x=[]def func(y):for i in y:if isinst