데이터와 기계학습 예측모형을 준비하자. DALEX
팩키지에 포함된 타이타닉 데이터를 준비한다.
library(tidyverse)
data(titanic_imputed, package = "DALEX")
head(titanic_imputed)
gender age class embarked fare sibsp parch survived
1 male 42 3rd Southampton 7.11 0 0 0
2 male 13 3rd Southampton 20.05 0 2 0
3 male 16 3rd Southampton 20.05 1 1 0
4 female 39 3rd Southampton 20.05 1 1 1
5 female 16 3rd Southampton 7.13 0 0 1
6 male 25 3rd Southampton 7.13 0 0 1
생존확률 예측 기계학습 모형을 Random Forest, GLM 두가지 종류로 개발하자.
<- ranger::ranger(survived ~ .,
titanic_rf data = titanic_imputed,
classification = TRUE,
probability = TRUE)
<- glm(survived ~ .,
titanic_glm data = titanic_imputed)
기계학습 모형을 설명자(Explainer) 객체로 변환시킨다.
library(modelDown)
<- DALEX::explain(titanic_rf,
explainer_rf data = titanic_imputed[, -8],
y = titanic_imputed[, 8],
verbose = FALSE)
explainer_rf
Model label: ranger
Model class: ranger
Data head :
gender age class embarked fare sibsp parch
1 male 42 3rd Southampton 7.11 0 0
2 male 13 3rd Southampton 20.05 0 2
johnny_d
와 henry
두 탑승객을 집중적으로 살펴보자.
library(modelStudio)
library(parallelMap)
options(
parallelMap.default.mode = "socket",
parallelMap.default.cpus = 4,
parallelMap.default.show.info = FALSE
)
## 설명할 관측점 ------------------------------------
<-
titanic_list read_rds("data/titanic_list.rds")
<- bind_rows(titanic_list$data$henry, titanic_list$data$johnny_d)
new_obs rownames(new_obs) <- c("henry", "johnny")
modelStudio(explainer_rf,
new_observation = new_obs,
parallel = TRUE,
ms_options = modelStudioOptions(margin_left = 125, margin_ytitle = 90),
digits = 3,
facet_dim = c(3,2))
데이터 과학자 이광춘 저작
kwangchun.lee.7@gmail.com