1 데이터와 모형

데이터와 기계학습 예측모형을 준비하자. DALEX 팩키지에 포함된 타이타닉 데이터를 준비한다.

library(tidyverse)

data(titanic_imputed, package = "DALEX")

head(titanic_imputed)
  gender age class    embarked  fare sibsp parch survived
1   male  42   3rd Southampton  7.11     0     0        0
2   male  13   3rd Southampton 20.05     0     2        0
3   male  16   3rd Southampton 20.05     1     1        0
4 female  39   3rd Southampton 20.05     1     1        1
5 female  16   3rd Southampton  7.13     0     0        1
6   male  25   3rd Southampton  7.13     0     0        1

생존확률 예측 기계학습 모형을 Random Forest, GLM 두가지 종류로 개발하자.

titanic_rf <- ranger::ranger(survived ~ ., 
                           data = titanic_imputed, 
                           classification = TRUE, 
                           probability = TRUE)

titanic_glm <- glm(survived ~ ., 
                   data = titanic_imputed)

2 설명자(Explainer)

기계학습 모형을 설명자(Explainer) 객체로 변환시킨다.

library(modelDown)

explainer_rf <- DALEX::explain(titanic_rf,
                               data = titanic_imputed[, -8], 
                               y = titanic_imputed[, 8], 
                               verbose = FALSE)

explainer_rf
Model label:  ranger 
Model class:  ranger 
Data head  :
  gender age class    embarked  fare sibsp parch
1   male  42   3rd Southampton  7.11     0     0
2   male  13   3rd Southampton 20.05     0     2

3 인터랙티브 XAI 탐색기

johnny_dhenry 두 탑승객을 집중적으로 살펴보자.

library(modelStudio)
library(parallelMap)

options(
    parallelMap.default.mode        = "socket",
    parallelMap.default.cpus        = 4,
    parallelMap.default.show.info   = FALSE
)

## 설명할 관측점 ------------------------------------
titanic_list  <-  
  read_rds("data/titanic_list.rds")

new_obs <- bind_rows(titanic_list$data$henry, titanic_list$data$johnny_d)
rownames(new_obs) <- c("henry", "johnny")

modelStudio(explainer_rf, 
            new_observation = new_obs,
            parallel = TRUE,
            ms_options = modelStudioOptions(margin_left = 125, margin_ytitle = 90),
            digits = 3,
            facet_dim = c(3,2))
 

데이터 과학자 이광춘 저작

kwangchun.lee.7@gmail.com