推荐算法之关联规则实例

本文主要是介绍推荐算法之关联规则实例，希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

利用的知识

深度分箱
Apriori算法
数据连接、聚合等处理

数据说明

本数据来源于last.fm的数据
数据包含：

1892 users
17632 artists

12717 bi-directional user friend relations, i.e. 25434 (user_i, user_j) pairs
avg. 13.443 friend relations per user

92834 user-listened artist relations, i.e. tuples [user, artist, listeningCount]
avg. 49.067 artists most listened by each user
avg. 5.265 users who listened each artist

11946 tags

186479 tag assignments (tas), i.e. tuples [user, tag, artist]
avg. 98.562 tas per user
avg. 14.891 tas per artist
avg. 18.930 distinct tags used by each user
avg. 8.764 distinct tags used for each artist

数据集

artists.dat

This file contains information about music artists listened and tagged by the users.

tags.dat

This file contains the set of tags available in the dataset.
user_artists.dat

This file contains the artists listened by each user.

It also provides a listening count for each [user, artist] pair.
user_taggedartists.dat - user_taggedartists-timestamps.dat

These files contain the tag assignments of artists provided by each particular user.

hey also contain the timestamps when the tag assignments were done.
user_friends.dat

These files contain the friend relations between users in the database.

数据处理以及算法

将以上数据转化为csv再读取，否则由于有些数据较为混乱用read.table() 可能读不了数据

library(data.table)
library(sqldf)
library(dplyr)
library(arules)
library(Matrix)
library(xml2)
library(rvest)
library(arulesViz)
library(caret) getwd()
setwd('C:\\R\\working\\music\\data')
# read data
artists <- fread('artists.csv')
tags <- fread('tags.csv')
user_artists <- fread('user_artists.csv')
user_friends <- fread('user_friends.csv')
user_taggedartists <- fread('user_taggedartists.csv')
user_taggedartists_timestamps <- fread('user_taggedartists-timestamps.csv')

这篇关于推荐算法之关联规则实例的文章就介绍到这儿，希望我们推荐的文章对编程师们有所帮助！