1. 비개인화 추천

비개인화 추천이 필요한 이유는 사용자를 우선 식별하기 어려운 경우가 있다. 예를 들어 신규 고객이라든가, 추천 알고리즘이 너무 복잡해서 계산이 오래 걸려 이런 경우를 피하고 싶은데 그럼에도 불구하고 나름 효과적인 추천 알고리즘이 필요한 경우 비개인화 추천 알고리즘이 유용하다.

과거 빌보드 챠트가 대표적인데 매출이나 라디오 방송횟수를 고려하여 매주 순위를 매겼다. 영화 순위와 가장 인기있는 상품(매출기준, 별점기준) 등을 통해 제품추천 순위를 정한다.

2. 영화 비개인화 추천

2.1. 영화 및 평점 데이터

코세라 미네소타 대학 추천시스템 강의에 활용된 “Non-Personalized and Stereotype-Based Recommenders” 데이터(movies.csv, ratings.csv)를 활용하여 비개인화 추천 알고리즘을 개발해본다.

# 1. 데이터 가져오기 -----------------------

movie_df <- read_csv("data/movies.csv")
rating_df <- read_csv("data/ratings.csv")

# 2. 데이터 정제과정 -----------------------

nonperson_df <- left_join(rating_df, movie_df, by="movieId")

nonperson_df %>% sample_n(100) %>% 
    DT::datatable()

2.2. 비개인화 추천 알고리즘

영화별로 묶어 영화를 본 관객 평점을 평균내어 가장 높은 평점을 받은 영화 10개를 뽑아 이를 추천영화로 제시한다. 통계적으로 보면 평균을 내어 이를 활용하는 것으로 볼 수 있다.

# 3. 고평점 영화 추천 -----------------------

nonperson_df %>% group_by(movieId, title) %>% 
    summarise(mean_movie_rating = mean(rating)) %>% 
    arrange(desc(mean_movie_rating)) %>% 
    ungroup() %>% 
    top_n(10, mean_movie_rating)
# A tibble: 10 x 3
   movieId                            title mean_movie_rating
     <int>                            <chr>             <dbl>
 1     318 Shawshank Redemption, The (1994)          4.364362
 2     858            Godfather, The (1972)          4.315848
 3    1248             Touch of Evil (1958)          4.259259
 4    2959                Fight Club (1999)          4.258503
 5    7502          Band of Brothers (2001)          4.247423
 6    1203              12 Angry Men (1957)          4.246032
 7    2859         Stop Making Sense (1984)          4.220000
 8    1221   Godfather: Part II, The (1974)          4.218462
 9     296              Pulp Fiction (1994)          4.217781
10    2571               Matrix, The (1999)          4.195359

3. 제품추천 (Product Association) 1

다음 단계로 관객이 특정 영화를 하나 본 사실을 바탕으로 다른 영화를 추천해보자. 마치 고객이 물건을 하나 구매한 후에 다른 제품을 추천하는 것과 유사하다. 이를 위해서 사용하는 기법이 지지도(Support), 신뢰도(Confidence), 향상도(lift)를 활용하는 것이다. 연관분석에 대한 xwMOOC 장바구니 데이터분석을 참조한다.

3.1. 연관분석기법으로 영화추천

가장 먼저 arules 팩키지를 활용 관객-영화 행렬(User-Item Matrix)을 생성시킨다. 관객이 영화를 봤으면 1, 그렇지 않은 경우 0으로 인코딩하는데 엄청나게 성긴 행렬 객체로 표현해야 주기억장치(main memory)이 꽉 차서 컴퓨터가 사망하는 사례를 막을 수 있다. 이유는 간단하다. 영화가 너무 많고 약 2시간 기준이라고 하면 유저인 관객이 이 모든 영화를 다 보는 것은 불가능하기 때문에 거의 대부분의 영화에 0 이 채워지게 된다. 따라서 관객-영화 행렬이 엄청 커질 수 밖에 없다.

# 4. Lift -----------------------------------
(user_item_matrix <- as(split(nonperson_df$movieId, nonperson_df$userId), "transactions"))
transactions in sparse format with
 862 transactions (rows) and
 2500 items (columns)
format(object.size(user_item_matrix), units = "auto")
[1] "1.2 Mb"
rule_param = list(
    supp = 0.001,
    conf = 0.7,
    maxlen = 2
)

movie_arules <- apriori(user_item_matrix, parameter = rule_param)
Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.7    0.1    1 none FALSE            TRUE       5   0.001      1
 maxlen target   ext
      2  rules FALSE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 0 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[2500 item(s), 862 transaction(s)] done [0.02s].
sorting and recoding items ... [2500 item(s)] done [0.01s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 done [0.11s].
writing ... [249409 rule(s)] done [0.06s].
creating S4 object  ... done [0.04s].
movie_arules <- as(movie_arules,"data.frame")

3.2. 영화추천

다음 단계로 관객-영화 행렬(User-Item Matrix)을 연관분석 특히, 향상도(lift)로 계산한 후에 매트릭스(The Matrix) 영화를 본 관객에게 향상도를 기준으로 다음에 볼 영화를 추천해 보자.

## 4.1. 추천 영화 ---------------------------

recom_df <- movie_arules %>% 
    mutate(tmp_movie = str_extract_all(rules, "[0-9]+")) %>% 
    separate(tmp_movie, c("lhs_movie", "rhs_movie"), sep=",", fill="left", remove = TRUE) %>% 
    mutate(lhs_movie = as.numeric(str_extract(lhs_movie, "[[:number:]]+")), 
           rhs_movie = as.numeric(str_extract(rhs_movie, "[[:number:]]+")))

recom_df <- movie_df %>% select(movieId, title) %>% 
    right_join(recom_df, by=c("movieId" = "rhs_movie")) %>% 
    rename(recom_title=title, recom_movieId = movieId) 

recom_df <- movie_df %>% select(movieId, title) %>% 
    right_join(recom_df, by=c("movieId" = "lhs_movie")) %>% 
    rename(source_title=title, source_movieId = movieId) %>% 
    select(rules, lift, support, confidence, source_movieId, source_title, recom_movieId, recom_title)

recom_df %>% filter(source_movieId == 2571) %>% 
    arrange(desc(lift)) %>% DT::datatable() %>% 
    DT::formatRound(c("lift", "support", "confidence"), digits=3)

4. 농담 추천 2

4.1. recommenderlab 팩키지를 통한 비개인화 추천

recommenderlab 팩키지를 기능을 활용하여 즉각 농담을 추천할 수 있다. Jester5k 농담은 5,000명이 100개 농담에 대한 평점이 담겨있다.

recommenderRegistry 체계를 활용하기 때문에 먼저 등록을 한다. 그리고 나서 Recommendermethod = "POPULAR"를 인자로 넘기면 추천을 해준다. 첫번째 유저 u2841에 대한 추천 농담을 3개 뽑는 과정은 다음과 같다.

# 0. 환경설정 ------------------------------

# library(recommenderlab) # devtools::install_github("mhahsler/recommenderlab")
# library(tidyverse)
# library(stringr)

# 1. 데이터 가져오기 -----------------------
data(Jester5k)

# 2. 팩키지 사용 추천 ----------------------
recommenderRegistry$get_entry("POPULAR", type ="realRatingMatrix")
Recommender method: POPULAR for realRatingMatrix
Description: Recommender based on item popularity.
Reference: NA
Parameters:
  normalize    aggregationRatings aggregationPopularity
1  "center" new("standardGeneric" new("standardGeneric"
joke_recom <- Recommender(Jester5k, method = "POPULAR")

joke_pred <- predict(joke_recom, Jester5k[1:3,])
(joke_pred_list <- as(joke_pred, "list"))
$u2841
 [1] "j89" "j72" "j76" "j88" "j83" "j87" "j81" "j78" "j73" "j80"

$u15547
 [1] "j89" "j93" "j76" "j88" "j91" "j83" "j87" "j81" "j97" "j78"

$u15221
character(0)
cat(JesterJokes[joke_pred_list$u2841[1:3]], sep = "\n\n")
A radio conversation of a US naval ship with Canadian authorities ... Americans: Please divert your course 15 degrees to the North to avoid a collision. Canadians: Recommend you divert YOUR course 15 degrees to the South to avoid a collision. Americans: This is the Captain of a US Navy ship. I say again, divert YOUR course. Canadians: No. I say again, you divert YOUR course. Americans: This is the aircraft carrier USS LINCOLN, the second largest ship in the United States' Atlantic Fleet. We are accompanied by three destroyers, three cruisers and numerous support vessels. I demand that you change your course 15 degrees north, that's ONE FIVE DEGREES NORTH, or counter-measures will be undertaken to ensure the safety of this ship. Canadians: This is a lighthouse. Your call.

On the first day of college, the Dean addressed the students, pointing out some of the rules: "The female dormitory will be out-of-bounds for all male students and the male dormitory to the female students. Anybody caught breaking this rule will be finded $20 the first time." He continued, "Anybody caught breaking this rule the second time will be fined $60. Being caught a third time will cost you a fine of $180. Are there any questions ?" At this point, a male student in the crowd inquired: "How much for a season pass ?"

There once was a man and a woman that both got in a terrible car wreck. Both of their vehicles were completely destroyed, buy fortunately, no one was hurt. In thankfulness, the woman said to the man, 'We are both okay, so we should celebrate. I have a bottle of wine in my car, let's open it.' So the woman got the bottleout of the car, and handed it to the man. The man took a really big drink, and handed the woman the bottle. The woman closed the bottle and put it down. The man asked, 'Aren't you going to take a drink?' The woman cleverly replied, 'No, I think I'll just wait for the cops to get here.'

4.2. 인기도를 세가지 기준으로 적용하기

첫번째로 농담 평점을 기준으로, 두번째로 유저 평가수를 기준으로, 세번째로 이 둘을 조합한 기준을 넣어 농담을 추천해본다.

4.2.1. 인기도가 높은 농담 추천

Jester5k 평점 데이터를 정규화하고 농담이 칼럼기준으로 되어 있어 유저가 평가한 농담 평점을 평균내고 이를 평점이 높은 순으로 정렬하고 나서 상위 3개를 뽑는다.

# 3. 인기도에 따른 추천 --------------------
## 3.1. 평균 평점이 높은 농담
joke_avg_top3 <- Jester5k %>% 
    normalize %>% 
    colMeans %>% 
    sort(decreasing = TRUE) %>% 
    head(3)

cat(JesterJokes[names(joke_avg_top3)], sep = "\n\n")
A guy goes into confession and says to the priest, "Father, I'm 80 years old, widower, with 11 grandchildren. Last night I met two beautiful flight attendants. They took me home and I made love to both of them. Twice." The priest said: "Well, my son, when was the last time you were in confession?" "Never Father, I'm Jewish." "So then, why are you telling me?" "I'm telling everybody."

A radio conversation of a US naval ship with Canadian authorities ... Americans: Please divert your course 15 degrees to the North to avoid a collision. Canadians: Recommend you divert YOUR course 15 degrees to the South to avoid a collision. Americans: This is the Captain of a US Navy ship. I say again, divert YOUR course. Canadians: No. I say again, you divert YOUR course. Americans: This is the aircraft carrier USS LINCOLN, the second largest ship in the United States' Atlantic Fleet. We are accompanied by three destroyers, three cruisers and numerous support vessels. I demand that you change your course 15 degrees north, that's ONE FIVE DEGREES NORTH, or counter-measures will be undertaken to ensure the safety of this ship. Canadians: This is a lighthouse. Your call.

A guy walks into a bar, orders a beer and says to the bartender, "Hey, I got this great Polish Joke..." The barkeep glares at him and says in a warning tone of voice: "Before you go telling that joke you better know that I'm Polish, both bouncers are Polish and so are most of my customers" "Okay" says the customer,"I'll tell it very slowly."

4.2.2. 평가가 많은 농담 추천

평점보다 농담에 대한 평가횟수가 높은 농담을 정렬하여 상위 3개를 추출하여 추천한다.

## 3.2. 평가수 높은 농담

joke_freq_top3 <- Jester5k %>% 
    normalize %>% 
    colCounts %>% 
    sort(decreasing = TRUE) %>% 
    head(3)

cat(JesterJokes[names(joke_freq_top3)], sep = "\n\n")
Q. Did you hear about the dyslexic devil worshiper? A. He sold his soul to Santa.

They asked the Japanese visitor if they have elections in his country. "Every Morning" he answers.

Q: What did the blind person say when given some matzah? A: Who the hell wrote this?

4.2.3. 평가수와 평점을 조합하여 추천

이 둘을 조합하여 농담을 추천하는 것도 가능하다.

## 3.3. 평점이 높고 자주 언급되는 농담

joke_avg_freq_top3 <- Jester5k %>% 
    normalize %>% 
    binarize(minRating = 5) %>% 
    colCounts() %>% 
    sort(decreasing = TRUE) %>% 
    head(3)

cat(JesterJokes[names(joke_avg_freq_top3)], sep = "\n\n")
A guy goes into confession and says to the priest, "Father, I'm 80 years old, widower, with 11 grandchildren. Last night I met two beautiful flight attendants. They took me home and I made love to both of them. Twice." The priest said: "Well, my son, when was the last time you were in confession?" "Never Father, I'm Jewish." "So then, why are you telling me?" "I'm telling everybody."

An old Scotsmen is sitting with a younger Scottish gentleman and says the boy. "Ah, lad look out that window. You see that stone wall there, I built it with me own bare hands, placed every stone meself. But do they call me MacGregor the wall builder? No! He Takes a few sips of his beer then says, "Aye, and look out on that lake and eye that beautiful pier. I built it meself, laid every board and hammered each nail but do they call me MacGregor the pier builder? No! He continues..."And lad, you see that road? That too I build with me own bare hands. Laid every inch of pavement meself, but do they call MacGregor the road builder? No!" Again he returns to his beer for a few sips, then says, "Agh, but you screw one sheep..."

A man arrives at the gates of heaven. St. Peter asks, "Religion?" The man says, "Methodist." St. Peter looks down his list, and says, "Go to room 24, but be very quiet as you pass room 8." Another man arrives at the gates of heaven. "Religion?" "Baptist." "Go to room 18, but be very quiet as you pass room 8." A third man arrives at the gates. "Religion?" "Jewish." "Go to room 11, but be very quiet as you pass room 8." The man says, "I can understand there being different rooms for different religions, but why must I be quiet when I pass room 8?" St. Peter tells him, "Well the Catholics are in room 8, and they think they're the only ones here.

  1. Movie Recommendation with Market Basket Analysis

  2. Recommender Systems: Non-personalized Recommender