MLPClassifier 연습 1 에 이어서 두 번째 연습을 해본다.


실습_20200702

2. winequality-red 데이터의 와인 등급을 분류해보자.

In [1]:
import pandas as pd
import numpy as np
redwine = pd.read_csv('winequality-red.csv', delimiter=';')
In [2]:
redwine.head(5)
Out[2]:
fixed acidity volatile acidity citric acid residual sugar chlorides free sulfur dioxide total sulfur dioxide density pH sulphates alcohol quality
0 7.4 0.70 0.00 1.9 0.076 11.0 34.0 0.9978 3.51 0.56 9.4 5
1 7.8 0.88 0.00 2.6 0.098 25.0 67.0 0.9968 3.20 0.68 9.8 5
2 7.8 0.76 0.04 2.3 0.092 15.0 54.0 0.9970 3.26 0.65 9.8 5
3 11.2 0.28 0.56 1.9 0.075 17.0 60.0 0.9980 3.16 0.58 9.8 6
4 7.4 0.70 0.00 1.9 0.076 11.0 34.0 0.9978 3.51 0.56 9.4 5
In [3]:
redwine.shape
Out[3]:
(1599, 12)
  • 데이터셋에서 독립변수와 종속변수 분리. 종속변수는 quality 칼럼
In [5]:
redwine_X = redwine.iloc[:, :-1] # 종속변수인 마지막 칼럼만 빼고, 모든 행 데이터를 redwine_X에 할당
redwine_y = redwine.iloc[:, -1] # 종속변수인 quality 칼럼의 모든 행 데이터를 redwine_y에 할당
  • 학습용 데이터와 검증용 데이터를 7:3 비율로 분할
In [6]:
from sklearn.model_selection import train_test_split
In [7]:
train_X, test_X, train_y, test_y = train_test_split(redwine_X, redwine_y, test_size = 0.3, random_state=92)
In [8]:
print(train_X.shape, test_X.shape)
(1119, 11) (480, 11)
  • Scikit-learn 패키지의 MLPClassifier 함수를 사용하여 인공신경망을 이용한 분류모형 만들기
In [9]:
from sklearn.neural_network import MLPClassifier
In [12]:
# hidden_layer_sizes는 은닉층 레이어의 개수와 각 레이어에 들어 있는 퍼셉트론 개수. (50, 50, 30)은 은닉층 3개, 각 층에 퍼셉트론 50, 50, 30개
mlp = MLPClassifier(hidden_layer_sizes = (50, 50, 30))
In [13]:
mlp.fit(train_X, train_y)
Out[13]:
MLPClassifier(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
              beta_2=0.999, early_stopping=False, epsilon=1e-08,
              hidden_layer_sizes=(50, 50, 30), learning_rate='constant',
              learning_rate_init=0.001, max_fun=15000, max_iter=200,
              momentum=0.9, n_iter_no_change=10, nesterovs_momentum=True,
              power_t=0.5, random_state=None, shuffle=True, solver='adam',
              tol=0.0001, validation_fraction=0.1, verbose=False,
              warm_start=False)
In [14]:
print("Training score: %s"% mlp.score(train_X, train_y))
Training score: 0.6210902591599643
  • 검증용 데이터셋의 입력 데이터를 이용해 예측
In [15]:
pred = mlp.predict(test_X)
In [16]:
pred
Out[16]:
array([5, 5, 6, 5, 5, 6, 6, 6, 5, 6, 6, 6, 6, 5, 5, 5, 5, 5, 5, 6, 6, 6,
       5, 5, 5, 5, 6, 6, 6, 5, 6, 6, 5, 6, 6, 5, 5, 6, 6, 5, 5, 6, 6, 6,
       5, 6, 5, 7, 5, 5, 5, 5, 6, 5, 5, 5, 6, 5, 5, 5, 5, 5, 6, 5, 5, 5,
       5, 6, 5, 5, 5, 6, 6, 5, 5, 7, 5, 5, 6, 5, 5, 5, 5, 5, 5, 5, 5, 5,
       5, 5, 5, 6, 6, 7, 5, 5, 5, 6, 5, 5, 6, 6, 5, 6, 5, 5, 6, 5, 6, 6,
       6, 6, 5, 5, 5, 5, 5, 6, 5, 7, 6, 6, 5, 5, 5, 5, 6, 5, 5, 6, 5, 7,
       6, 5, 6, 5, 6, 6, 6, 6, 5, 6, 6, 7, 5, 5, 7, 6, 5, 5, 5, 5, 6, 6,
       5, 6, 6, 5, 5, 5, 5, 6, 5, 6, 6, 5, 5, 5, 6, 5, 5, 5, 6, 5, 5, 5,
       6, 5, 5, 6, 6, 5, 5, 5, 5, 6, 5, 6, 7, 5, 5, 5, 5, 5, 6, 5, 5, 5,
       6, 5, 5, 6, 6, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 5, 5, 6, 5, 5, 5, 6,
       5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 5, 5, 6, 5, 5, 5, 5, 5, 5, 5, 5,
       6, 5, 7, 6, 5, 6, 5, 6, 5, 5, 5, 6, 6, 5, 5, 5, 5, 6, 5, 6, 6, 6,
       5, 5, 6, 5, 6, 6, 5, 6, 5, 5, 5, 6, 5, 6, 6, 5, 5, 5, 6, 6, 6, 6,
       5, 6, 6, 5, 5, 6, 6, 7, 7, 6, 5, 5, 5, 5, 5, 6, 6, 5, 5, 6, 6, 6,
       5, 5, 5, 5, 5, 5, 5, 5, 6, 7, 5, 7, 7, 5, 5, 5, 5, 5, 6, 7, 7, 5,
       5, 6, 6, 6, 6, 6, 7, 6, 6, 6, 6, 6, 5, 6, 5, 7, 5, 6, 5, 5, 6, 6,
       5, 5, 6, 5, 5, 5, 6, 5, 6, 5, 5, 6, 6, 7, 6, 5, 5, 6, 5, 6, 6, 6,
       5, 5, 5, 5, 6, 5, 6, 5, 7, 5, 5, 5, 6, 5, 5, 6, 5, 6, 6, 5, 5, 6,
       5, 6, 6, 5, 5, 6, 5, 5, 5, 5, 5, 5, 5, 6, 5, 6, 6, 5, 5, 6, 5, 6,
       5, 5, 5, 5, 5, 6, 6, 5, 5, 5, 6, 6, 5, 6, 6, 5, 5, 5, 5, 5, 5, 5,
       6, 7, 5, 6, 6, 6, 5, 6, 5, 5, 6, 5, 5, 5, 5, 6, 6, 5, 5, 6, 6, 6,
       5, 6, 5, 6, 6, 6, 6, 6, 7, 6, 7, 5, 6, 5, 6, 6, 5, 5])
  • crosstab() 함수를 이용해 실제 값과 예측한 값의 교차분류표 작성
In [17]:
pd.crosstab(test_y, pred, rownames=['True'], colnames=['Predicted'], margins=True)
Out[17]:
Predicted 5 6 7 All
True
3 2 0 0 2
4 11 3 0 14
5 171 45 2 218
6 86 97 7 190
7 7 31 14 52
8 1 3 0 4
All 278 179 23 480
  • 혼동행렬(confusion matrix)를 이용해 정확도(accuracy)를 계산
In [18]:
from sklearn.metrics import accuracy_score
In [19]:
accuracy_score(test_y, pred)
Out[19]:
0.5875