1 실업률 ¹

2018-06-15 김동연 부총리 겸 기획재정부 정부가 8년 만에 최악 수준을 기록한 고용상황에 우려를 표하며 대책 마련에 총력을 기울이겠다고 밝혔다. 실업률이 얼마나 심각한지 실업률 데이터를 바탕으로 향후 12개월(1년) 실업률 예측작업을 수행한다.

2 실업률 데이터

FRED 웹사이트에서 대한민국 15-64세 경제활동인구 실업률 데이터를 tidyquant 팩키지 tq_get() 함수로 얻을 수 있다. 단, 해당 실업률 데이터는 LRUN64TTKRM156N으로 특정한다.

다운로드 받은 데이터가 제대로 다운로드 받았는지 확인하는 과정과 함께 시계열 데이터를 시각화하는 것이 좋은 시작점이다. 즉, 시계열 특성이 반영되고 있지만, 실업률 저점이 2014년 이후 지속적으로 높아지고 있는 것은 긍정적으로 파악되지는 않고 있다.

# 0. 환경설정 -----
library(tidyquant)
library(timetk)
library(sweep)
library(extrafont)
library(forecast)
loadfonts()

# https://www.rstudio.com/resources/videos/the-future-of-time-series-and-financial-analysis-in-the-tidyverse/

# 1. 데이터 -----
unemp_tbl <- tq_get("LRUN64TTKRM156N", 
                            get  = "economic.data", 
                            from = "2000-01-01",
                            to   = Sys.Date())
unemp_tbl <- unemp_tbl %>% 
  rename(실업률 = price,
            연월 = date)


# 2. 시각화 -----
unemp_tbl %>%
  ggplot(aes(x = 연월, y = 실업률)) +
  geom_line(size = 1, color = palette_light()[[1]]) +
  geom_smooth(method = "loess") +
  labs(title = "대한민국 실업률: 월별", x = "", y = "실업률(%)") +
  scale_y_continuous(labels = scales::comma) +
  scale_x_date(date_breaks = "1 year", date_labels = "%Y") +
  theme_tq(base_family = "NanumGothic")

3 실업률 예측모형 ²

티블(tibble) 자료구조를 ts 자료구조로 변환시켜야 forecast 팩키지 예측함수를 사용이 가능하다. 이를 위해서 timetk 팩키지 tk_ts() 함수를 사용한다. 자료형이 티블에서 ts로 변환되었기 때문에 ets() 모형을 데이터에 적합시킨다. sweep 팩키지는 broom 팩키지의 시계열 버전이라 sw_augment() 함수를 통해 잔차분석을 수월히 진행할 수 있다. 그외 sw_tidy(), sw_glance() 함수도 broom 팩키지 tidy(), glance()와 대응된다.

# 3. 예측모형 -----
## 3.1. ts 객체 변환
unemp_ts <- timetk::tk_ts(unemp_tbl, start = 2000, freq = 12, silent = TRUE)
unemp_ts

          Jan      Feb      Mar      Apr      May      Jun      Jul
2000 5.931013 5.870873 5.285474 4.659365 4.286675 4.064119 4.143182
2001 5.228812 5.655583 5.308022 4.279165 3.855096 3.750855 3.799008
2002 4.298329 4.208625 3.938031 3.532353 3.288700 3.079808 3.188587
2003 3.888788 4.013166 3.888183 3.614051 3.550899 3.602840 3.720217
2004 4.120422 4.298510 4.166479 3.826836 3.658401 3.536044 3.817018
2005 4.433993 4.462263 4.254328 3.987100 3.714879 3.819937 3.869162
2006 3.896845 4.257424 4.096948 3.712477 3.393047 3.534766 3.592938
2007 3.797953 3.819418 3.672575 3.548992 3.356952 3.354129 3.431332
2008 3.405011 3.587192 3.510929 3.380321 3.220355 3.241237 3.267937
2009 3.727944 4.037341 4.170509 4.016996 3.949855 4.034069 3.921965
2010 4.731809 4.694627 4.207071 3.907348 3.309721 3.616699 3.811405
2011 3.843910 4.231043 4.294645 3.834807 3.328612 3.407567 3.421976
2012 3.444992 3.896400 3.749047 3.607297 3.254760 3.314622 3.199410
2013 3.342452 3.820784 3.577041 3.339403 3.196707 3.226809 3.318135
2014 3.359976 4.216456 3.961673 4.030856 3.718682 3.720711 3.559111
2015 3.653598 4.295198 4.084479 4.016711 3.892499 3.997520 3.792783
2016 3.665888 4.596217 4.426358 4.121050 3.808961 3.814451 3.687397
2017 3.624455 4.488178 4.231256 4.376251 3.702203 3.920807 3.571665
2018 3.592324 4.019503 4.580604 4.327964                           
          Aug      Sep      Oct      Nov      Dec
2000 4.259364 4.222589 3.904973 4.038946 4.577212
2001 3.821917 3.469259 3.495138 3.544870 3.886532
2002 3.349940 2.857104 2.976445 3.002390 3.263901
2003 3.694343 3.498295 3.599404 3.707368 3.929895
2004 3.812491 3.496922 3.561485 3.623615 4.000286
2005 3.728950 3.813742 3.806724 3.430746 3.683419
2006 3.581857 3.387993 3.488892 3.374088 3.486479
2007 3.336142 3.156635 3.154239 3.155945 3.207984
2008 3.291402 3.125095 3.175379 3.216834 3.430122
2009 3.851386 3.503559 3.401078 3.496823 3.582616
2010 3.496932 3.603265 3.491325 3.103630 3.569262
2011 3.153797 3.133324 3.027134 2.993428 3.112772
2012 3.092466 3.053996 2.909133 2.904776 3.021876
2013 3.158990 2.909240 2.890180 2.790343 3.084019
2014 3.475822 3.315199 3.330528 3.205064 3.465598
2015 3.564995 3.320362 3.188543 3.176398 3.329814
2016 3.732952 3.714092 3.486449 3.232846 3.289091
2017 3.740204 3.501175 3.335971 3.261661 3.420590
2018

## 3.2. 시계열 모형 적합
fit_ets <- unemp_ts %>%
  ets()
  # forecast::auto.arima(stepwise=TRUE, approximation=FALSE, max.order=2)

# sw_tidy(fit_ets)
# sw_glance(fit_ets)

시계열 예측모형은 수도없이 많지만, 이번에는 ets() 모형을 주예측모형으로 선정하여 시계열을 예측한다. 또한, sw_augment() 함수를 통해 시계열 예측치를 후속 분석을 위해 사용할 수 있도록 데이터프레임으로 저장시킨다.

augment_fit_ets <- sw_augment(fit_ets)
augment_fit_ets

# A tibble: 220 x 4
   index         .actual .fitted  .resid
   <S3: yearmon>   <dbl>   <dbl>   <dbl>
 1 1 2000           5.93    5.66  0.0481
 2 2 2000           5.87    6.17 -0.0480
 3 3 2000           5.29    5.70 -0.0731
 4 4 2000           4.66    4.87 -0.0435
 5 5 2000           4.29    4.37 -0.0182
 6 6 2000           4.06    4.16 -0.0219
 7 7 2000           4.14    4.08  0.0164
 8 8 2000           4.26    4.17  0.0202
 9 9 2000           4.22    3.99  0.0573
10 10 2000          3.90    4.11 -0.0503
# ... with 210 more rows

## 3.3. 잔차 시각화
augment_fit_ets %>%
  ggplot(aes(x = index, y = .resid)) +
  geom_hline(yintercept = 0, color = "grey40") +
  geom_point(color = palette_light()[[1]], alpha = 0.5) +
  geom_smooth(method = "loess") +
  scale_x_yearmon(n = 10) +
  labs(title = "대한민국 실업률: ETS 잔차", x = "", y = "실업률(%)") +
  theme_tq(base_family = "NanumGothic")

sw_tidy_decomp() 함수로 시계열 분해시켜 각 구성요소별로 시계열 신호를 시각화할 수 있다.

## 3.4. 실업률 시계열 분해

decomp_fit_ets <- sw_tidy_decomp(fit_ets)
decomp_fit_ets

# A tibble: 221 x 4
   index         observed level season
   <S3: yearmon>    <dbl> <dbl>  <dbl>
 1 12 1999          NA     4.92  0.977
 2 1 2000            5.93  5.10  1.16 
 3 2 2000            5.87  4.92  1.20 
 4 3 2000            5.29  4.64  1.14 
 5 4 2000            4.66  4.49  1.04 
 6 5 2000            4.29  4.43  0.969
 7 6 2000            4.06  4.36  0.934
 8 7 2000            4.14  4.41  0.939
 9 8 2000            4.26  4.48  0.950
10 9 2000            4.22  4.67  0.902
# ... with 211 more rows

decomp_fit_ets %>%
  gather(key = key, value = value, -index) %>%
  mutate(key = forcats::as_factor(key)) %>%
  ggplot(aes(x = index, y = value, group = key)) +
  geom_line(color = palette_light()[[2]]) +
  geom_ma(ma_fun = SMA, n = 12, size = 1) +
  facet_wrap(~ key, scales = "free_y", nrow = 3) +
  scale_x_yearmon(n = 10) +
  labs(title = "대한민국 실업률: ETS 분해", x = "", y = "") +
  theme_tq(base_family = "NanumGothic") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

forecast() 함수와 기간을 h=12로 지정하면 앞서 ets 모형을 바탕으로 12개월치 예측값을 추정하여 준다. 이를 시각화하여 보면 특단의 조처가 없다면 실업률은 지속적으로 치솟을 것으로 예측되고 있다.

# 3.5. 실업률 예측: 12개월
fcast_ets <- fit_ets %>%
  forecast(h = 12)

sw_sweep(fcast_ets, fitted = TRUE, timetk_idx = TRUE) %>% tail(12)

# A tibble: 12 x 7
   index      key      실업률 lo.80 lo.95 hi.80 hi.95
   <date>     <chr>     <dbl> <dbl> <dbl> <dbl> <dbl>
 1 2018-05-01 forecast   3.82  3.58  3.46  4.06  4.19
 2 2018-06-01 forecast   3.91  3.60  3.44  4.21  4.37
 3 2018-07-01 forecast   3.74  3.39  3.21  4.08  4.26
 4 2018-08-01 forecast   3.74  3.35  3.15  4.12  4.32
 5 2018-09-01 forecast   3.54  3.14  2.92  3.94  4.15
 6 2018-10-01 forecast   3.43  3.01  2.79  3.85  4.08
 7 2018-11-01 forecast   3.37  2.93  2.69  3.82  4.05
 8 2018-12-01 forecast   3.61  3.10  2.83  4.11  4.38
 9 2019-01-01 forecast   4.01  3.42  3.10  4.60  4.92
10 2019-02-01 forecast   4.78  4.03  3.64  5.52  5.91
11 2019-03-01 forecast   4.68  3.91  3.51  5.44  5.84
12 2019-04-01 forecast   4.32  3.59  3.20  5.06  5.44

sw_sweep(fcast_ets, timetk_idx = TRUE) %>%
  ggplot(aes(x = index, y = 실업률, color = key)) +
  geom_ribbon(aes(ymin = lo.95, ymax = hi.95), 
              fill = "#D5DBFF", color = NA, size = 0) +
  geom_ribbon(aes(ymin = lo.80, ymax = hi.80, fill = key), 
              fill = "#596DD5", color = NA, size = 0, alpha = 0.8) +
  geom_line(size = 1) +
  labs(title = "대한민국 실업률: ETS 모형 예측", x = "", y = "", color="구분") +
  theme_tq(base_family = "NanumGothic") +
  scale_y_continuous(labels = scales::comma) +
  scale_x_date(date_breaks = "1 year", date_labels = "%Y") +
  scale_color_tq() +
  scale_fill_tq()

데이터 과학 – 금융(Finance)

실업률 예측 - tidyquant

1 실업률 ¹

2 실업률 데이터

3 실업률 예측모형 ²

데이터 과학 – 금융(Finance)

실업률 예측 - tidyquant

1 실업률 1

2 실업률 데이터

3 실업률 예측모형 2

1 실업률 ¹

3 실업률 예측모형 ²