本文主要是介绍推荐算法之关联规则实例,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
利用的知识
深度分箱
Apriori算法
数据连接、聚合等处理
数据说明
本数据来源于last.fm的数据
数据包含:
1892 users
17632 artists
12717 bi-directional user friend relations, i.e. 25434 (user_i, user_j) pairs
avg. 13.443 friend relations per user
92834 user-listened artist relations, i.e. tuples [user, artist, listeningCount]
avg. 49.067 artists most listened by each user
avg. 5.265 users who listened each artist
11946 tags
186479 tag assignments (tas), i.e. tuples [user, tag, artist]
avg. 98.562 tas per user
avg. 14.891 tas per artist
avg. 18.930 distinct tags used by each user
avg. 8.764 distinct tags used for each artist
数据集
- artists.dat
This file contains information about music artists listened and tagged by the users.
tags.dat
This file contains the set of tags available in the dataset.
user_artists.dat
This file contains the artists listened by each user.
It also provides a listening count for each [user, artist] pair.
user_taggedartists.dat - user_taggedartists-timestamps.dat
These files contain the tag assignments of artists provided by each particular user.
hey also contain the timestamps when the tag assignments were done.
user_friends.dat
These files contain the friend relations between users in the database.
数据处理以及算法
将以上数据转化为csv再读取,否则由于有些数据较为混乱用read.table() 可能读不了数据
library(data.table)
library(sqldf)
library(dplyr)
library(arules)
library(Matrix)
library(xml2)
library(rvest)
library(arulesViz)
library(caret) getwd()
setwd('C:\\R\\working\\music\\data')
# read data
artists <- fread('artists.csv')
tags <- fread('tags.csv')
user_artists <- fread('user_artists.csv')
user_friends <- fread('user_friends.csv')
user_taggedartists <- fread('user_taggedartists.csv')
user_taggedartists_timestamps <- fread('user_taggedartists-timestamps.csv')
这篇关于推荐算法之关联规则实例的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!