1 `iris` 예측모형

먼저 iris 품종 분류를 위한 예측모형을 개발해 본다. 이를 위한 작업흐름을 다음과 같이 잡아 실행한다.

pandas
sklearn
- train_test_split
- RandomForestClassifier
- metrics

이를 통해 iris 예측모형의 분류 정확성, 즉 모형 성능을 파악할 수 있다.

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn import metrics

## 데이터 가져오기
iris_df = pd.read_csv("data/iris_ws.csv")

## 데이터 전처리
iris_df.dropna(inplace=True)
X = iris_df.drop(columns=['variety'], axis=1)
y = iris_df['variety']

## 훈련/시험 데이터셋 분리
X_train, X_test, y_train, y_test = train_test_split(X, y, 
                          test_size=0.3, random_state=42)

## 예측모형 적합
model = RandomForestClassifier()
model.fit(X, y)

## 예측모형 성능

RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
                       max_depth=None, max_features='auto', max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, n_estimators=10,
                       n_jobs=None, oob_score=False, random_state=None,
                       verbose=0, warm_start=False)

/anaconda3/lib/python3.6/site-packages/sklearn/ensemble/forest.py:245: FutureWarning: The default value of n_estimators will change from 10 in version 0.20 to 100 in 0.22.
  "10 in version 0.20 to 100 in 0.22.", FutureWarning)

y_pred = model.predict(X_test)
print(metrics.accuracy_score(y_test, y_pred))

1.0

2 `iris` 예측모형 배포

iris 품종분류 예측모형으로 Random Forest를 사용하여 모형 성능 예측력 100%를 달성하였다. 이를 배포해보자. 먼저 앞서 모형 개발에 사용된 코드에서 일부를 바꿔야 한다. 데이터를 가져와서 결측값 제거 등 작업을 수행하고 Random Forest 예측모형을 data/ 디렉토리 rf_model.pkl 파일로 떨궈둔다.

# iris_rf.py 파일명으로 저장

import pandas as pd
from sklearn.ensemble import RandomForestClassifier
import pickle

## 데이터 가져오기
iris_df = pd.read_csv("data/iris_ws.csv")

## 데이터 전처리
iris_df.dropna(inplace=True)
X = iris_df.drop(columns=['variety'], axis=1)
y = iris_df['variety']

## 예측모형 적합
model = RandomForestClassifier()
model.fit(X, y)

## 예측모형 배포

RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
                       max_depth=None, max_features='auto', max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, n_estimators=10,
                       n_jobs=None, oob_score=False, random_state=None,
                       verbose=0, warm_start=False)

/anaconda3/lib/python3.6/site-packages/sklearn/ensemble/forest.py:245: FutureWarning: The default value of n_estimators will change from 10 in version 0.20 to 100 in 0.22.
  "10 in version 0.20 to 100 in 0.22.", FutureWarning)

iris_rf_model = 'data/rf_model.pkl'
with open(iris_rf_model, 'wb') as f:
    pickle.dump(model, f)
    
print(f"sucessfully deployed!!!")

sucessfully deployed!!!

ls -alh 명령어로 data/ 디렉토리 rf_model.pkl 파일을 확인한다.

ls -alh data/rf_model.pkl

-rw-r--r--  1 statkclee  staff    21K Oct 11 13:47 data/rf_model.pkl

3 `iris` 예측모형 배포 자동화

script 디렉토리 iris_rf.py 파일명으로 저장시킨다. 그리고 나서 이를 python 명령어로 실행시킨다.

python script/iris_rf.py

/anaconda3/lib/python3.6/site-packages/sklearn/ensemble/forest.py:245: FutureWarning: The default value of n_estimators will change from 10 in version 0.20 to 100 in 0.22.
  "10 in version 0.20 to 100 in 0.22.", FutureWarning)
sucessfully deployed!!!

4 파이썬 작업흐름도

파이썬을 통해 데이터 과학, 특히 앞서 개발한 예측모형 배포를 위한 자동화를 위해서 다음과 같은 작업흐름을 갖추게 된다.

파이썬 설치
Pypi 팩키지 관리자 설치
쥬피터 노트북으로 데이터 분석 및 예측모형 개발
.py 파이썬 스크립트로 작업 자동화

파이썬 작업흐름도

4.1 파이썬 설치

파이썬은 아나콘다 혹은 파이썬에서 설치하여 코딩을 하는 것이 일반적이다.

which python 명령어를 통해서 파이썬 버전을 확인한다.

which python

/anaconda3/bin/python

4.2 Pypi 팩키지 관리자

파이썬의 진정한 힘은 강력하고 다양한 파이썬 팩키지에서 나온다고 할 수 있다. 이를 손쉽게 설치할 수 있게 도와주는 것이 Pypi 팩키지 관리자다.

Do I need to install pip?을 참조하여 PyPi를 설치할 수 있다.

우분투: apt-get install python3-pip
윈도우: curl 명령어로 pip 설치 파일을 가져와서 python으로 실행하여 설치한다.

curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
python get-pip.py

4.3 파이썬과 `Pypi` 버전 매칭

파이썬 버전

python --version

Python 3.6.5 :: Anaconda, Inc.

Pypi 버전

pip --version

pip 19.1.1 from /anaconda3/lib/python3.6/site-packages/pip (python 3.6)

4.4 파이썬 스크립트 실행

echo "print('데이터 사이언스가 미래다!!! \n')" > hello_ds.py

python hello_ds.py

데이터 사이언스가 미래다!!!

RPA - 자동화(Automation)

CLI: 예측모형 자동배포 - `iris`

xwMOOC

2019-10-11

1 `iris` 예측모형

2 `iris` 예측모형 배포

3 `iris` 예측모형 배포 자동화

4 파이썬 작업흐름도

4.1 파이썬 설치

4.2 Pypi 팩키지 관리자

4.3 파이썬과 `Pypi` 버전 매칭

4.4 파이썬 스크립트 실행

RPA - 자동화(Automation)

CLI: 예측모형 자동배포 - iris

xwMOOC

2019-10-11

1 iris 예측모형

2 iris 예측모형 배포

3 iris 예측모형 배포 자동화

4 파이썬 작업흐름도

4.1 파이썬 설치

4.2 Pypi 팩키지 관리자

4.3 파이썬과 Pypi 버전 매칭

4.4 파이썬 스크립트 실행

CLI: 예측모형 자동배포 - `iris`

1 `iris` 예측모형

2 `iris` 예측모형 배포

3 `iris` 예측모형 배포 자동화

4.3 파이썬과 `Pypi` 버전 매칭