1 기계학습 자동화

2 ludwig

ludwig.ai 웹사이트에 Ludwig 의 딥러닝에 대한 자세한 사항을 확인할 수 있다.

  • Ludwig is a declarative deep learning framework

즉 선언적인 딥러닝 프레임워크로 그래프 문법(ggplot)에 익숙하면 수월하게 활용할 수 있다.

평론가는 관람한 영화가 맘에 들면 신선한 토마토를, 그렇지 않다면 썩은 토마토(rotten tomato)를 선택하는 데 지수가 높을수록 추천하는 평론가가 많다는 것을 의미하는데… 국내에서는 “썩토지수”라로 많이 알려져 있다.

Ludwig는 (비)정형 데이터를 모두 다룰 수 있지만 썩은 토마토 데이터셋을 가지고 새로 개발되고 있는 딥러닝 모형을 개발해보자. 다음 코드는 Ludwig Getting Stated에서 가져왔다.

# !pip install ludwig --user

import pandas as pd
from ludwig.api import LudwigModel

df = pd.read_csv('ludwig/rotten_tomatoes.csv')

model = LudwigModel(config='ludwig/rotten_tomatoes.yaml')
results = model.train(dataset=df)
# Lock 1420789236640 acquired on C:\swc\.lock_preprocessing
# Lock 1420789236640 released on C:\swc\.lock_preprocessing

딥러닝 모형에 다소 시간이 걸렸는데… 예측모형은 저장되면 이를 가져와서 inference 딥러닝 모형으로 예측이 가능하다.

import pandas as pd
from ludwig.api import LudwigModel
C:\Users\STATKC~1\ANACON~1\lib\site-packages\dask\dataframe\utils.py:369: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
  _numeric_index_types = (pd.Int64Index, pd.Float64Index, pd.UInt64Index)
C:\Users\STATKC~1\ANACON~1\lib\site-packages\dask\dataframe\utils.py:369: FutureWarning: pandas.Float64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
  _numeric_index_types = (pd.Int64Index, pd.Float64Index, pd.UInt64Index)
C:\Users\STATKC~1\ANACON~1\lib\site-packages\dask\dataframe\utils.py:369: FutureWarning: pandas.UInt64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
  _numeric_index_types = (pd.Int64Index, pd.Float64Index, pd.UInt64Index)
<frozen importlib._bootstrap>:219: RuntimeWarning: scipy._lib.messagestream.MessageStream size changed, may indicate binary incompatibility. Expected 56 from C header, got 64 from PyObject
C:\Users\statkclee\AppData\Roaming\Python\Python38\site-packages\torchaudio\backend\utils.py:62: UserWarning: No audio backend is available.
  warnings.warn("No audio backend is available.")
movie_model = LudwigModel.load('results/api_experiment_run/model')
ray.init() failed: Could not find any running Ray instance. Please specify the one to connect to by setting `--address` flag or `RAY_ADDRESS` environment variable.
predictions, _ = movie_model.predict(dataset='ludwig/rotten_tomatoes_test.csv')
predictions.head()
    recommended_probabilities  ...  recommended_probability
0     [0.10894948, 0.8910505]  ...                 0.891051
1    [0.20457983, 0.79542017]  ...                 0.795420
2  [0.0067676306, 0.99323237]  ...                 0.993232
3   [0.122318566, 0.87768143]  ...                 0.877681
4    [0.44897103, 0.55102897]  ...                 0.551029

[5 rows x 5 columns]

예측에 대한 자세한 사항을 살펴보자.

library(reticulate)
library(tidyverse)

rt_csv <- read_csv('ludwig/rotten_tomatoes_test.csv')

py$predictions %>% 
  janitor::clean_names() %>% 
  bind_cols(rt_csv) %>% 
  select(movie_title, 
         review_content ,
         recommended_probabilities,
                    recommended_predictions,
                    recommended_probability) %>% 
  knitr::kable()
movie_title review_content recommended_probabilities recommended_predictions recommended_probability
It … “It” is terrifically entertaining. 0.1089495, 0.8910505 TRUE 0.8910505
Talk to Her There is much to admire in Almodvar’s technical proficiency, but his quirky movies make little emotional impact. 0.2045798, 0.7954202 TRUE 0.7954202
Suspiria As Blanc, Swinton glides as if her feet never touch the ground… Her Josef is, by contrast, the film’s most moving element. 0.006767631, 0.993232369 TRUE 0.9932324
The Road To Guantanamo The material is beautifully put together, and it is powerful. 0.1223186, 0.8776814 TRUE 0.8776814
Den of Thieves At 140 minutes the film itself overstays its welcome, it is not half as clever as it clearly thinks it is and women are strictly optional extras. 0.448971, 0.551029 TRUE 0.5510290
The Broken Circle Breakdown Belgium’s Oscar entry is a shattering tale about the death of a six-year-old and its effects on family. Terrific bluegrass music. 0.03033006, 0.96966994 TRUE 0.9696699
3:10 to Yuma Unapologetically harsh and heedlessly entertaining despite its imperfections, the film includes two masterful performances from Crowe and Bale - actors so intense, they could wear red and intimidate a charging bull. 0.01022094, 0.98977906 TRUE 0.9897791
Operation Finale The sparring between Kingsley and Isaac is remarkable and the film’s structural flaws never blunt the the touching impact of its themes. 0.0173105, 0.9826895 TRUE 0.9826895
Friends With Money The cast is terrific, the movie isn’t. 0.4993716, 0.5006284 TRUE 0.5006284
Final Destination 5 A long and eventually tedious series of deaths, all in slightly sickening 3-D. Splattered eyeballs, snapped spines, heart kebabs - one numbingly after another, in diamond-hard focus and ruby-red color. 0.97655839, 0.02344162 FALSE 0.9765584