Jayden1116 2022. 1. 4. 21:50

๋ชฉํ‘œ

  • ํŠน์„ฑ ์ค‘์š”๋„ ๊ณ„์‚ฐ ๋ฐฉ๋ฒ•๋“ค ์ดํ•ด ๋ฐ ๋ชจ๋ธ ํ•ด์„์— ํ™œ์šฉ
  • Boosting์— ๋Œ€ํ•œ ์ดํ•ด ๋ฐ ๋ชจ๋ธ ํ•™์Šต

ํŠน์„ฑ ์ค‘์š”๋„

  1. Feature Importance(Mean Decrease Impurity ; MDI)
  • sklearn ํŠธ๋ฆฌ ๊ธฐ๋ฐ˜ ๋ถ„๋ฅ˜๊ธฐ์—์„œ ๊ธฐ๋ณธ์œผ๋กœ ์‚ฌ์šฉ๋˜๋Š” ๊ณ„์‚ฐ ๋ฐฉ๋ฒ•์œผ๋กœ ๊ฐ๊ฐ์˜ ํŠน์„ฑ์„ ๋ชจ๋“  ํŠธ๋ฆฌ์— ๋Œ€ํ•ด ํ‰๊ท  ๋ถˆ์ˆœ๋„ ๊ฐ์†Œ(MDI)๋ฅผ ๊ณ„์‚ฐํ•œ ๊ฐ’์ž…๋‹ˆ๋‹ค.

๋ถˆ์ˆœ๋„ ๊ฐ์†Œ(impurity decrease)๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค:

$$\displaystyle \frac{N_t}{N}$ * (impurity - \displaystyle\frac{N_{tR}}{N_t} * Rightimpurity - \displaystyle\frac{N_{tL}}{N_t}$ * Leftimpurity)$$

$$N: ์ „์ฒด ๊ด€์ธก์น˜ ์ˆ˜, N_t: ํ˜„์žฌ ๋…ธ๋“œ t์— ์กด์žฌํ•˜๋Š” ๊ด€์ธก์น˜ ์ˆ˜$$

$$N_{tL}, N_{tR}: ๋…ธ๋“œ t ์™ผ์ชฝ(L)/์˜ค๋ฅธ์ชฝ(R) ์ž์‹๋…ธ๋“œ์— ์กด์žฌํ•˜๋Š” ๊ด€์ธก์น˜ ์ˆ˜$$

$$๋งŒ์•ฝ SampleWeight๊ฐ€ ์ฃผ์–ด์ง„๋‹ค๋ฉด, N, N_t, N_{tR}, N_{tL}๋Š” ๊ฐ€์ค‘ํ•ฉ์„ ํ•ฉ๋‹ˆ๋‹ค.$$

  • ์ฃผ์˜์  : MDI Feature Importance๋Š” high cardinality features์— ๋Œ€ํ•ด ๊ณผํ•˜๊ฒŒ ๋†’์€ ๊ฐ’์ด ๋‚˜์˜ค๋Š” ๊ฒฝํ–ฅ์ด ์žˆ์Šต๋‹ˆ๋‹ค.
    ๋ฒ”์ฃผ๊ฐ€ ๋งŽ์„์ˆ˜๋ก ๊ฐ ๋…ธ๋“œ์— ๊ธฐ์—ฌํ•  ํ™•๋ฅ ์ด ๋†’๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.(ํŠนํžˆ max_depth์˜ ์ œํ•œ์„ ๋‘์ง€ ์•Š๋Š”๋‹ค๋ฉด ๋”์šฑ์ด ๊ณผํ•˜๊ฒŒ ์ธก์ •๋ฉ๋‹ˆ๋‹ค.)

์˜ˆ์‹œ)

# ํŠน์„ฑ ์ค‘์š”๋„
rf = pipe.named_steps['randomforestclassifier']
importances = pd.Series(rf.feature_importances_, X_train.columns)

%matplotlib inline
import matplotlib.pyplot as plt

n = 20
plt.figure(figsize=(10,n/2))
plt.title(f'Top {n} features')
importances.sort_values()[-n:].plot.barh();

image

  1. Drop-Column Importance
  • ํŠน์„ฑ๋งˆ๋‹ค ํ•˜๋‚˜์”ฉ drop ์ „๊ณผ ํ›„๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ์„ธํŒ…ํ•˜๊ณ  ๋ชจ๋ธ์— fittingํ•˜์—ฌ score๋ฅผ ๋น„๊ตํ•˜๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค.
  • ์ด๋ก ์ ์œผ๋กœ ํŠน์„ฑ์˜ ์ค‘์š”๋„๋ฅผ ์ธก์ •ํ•˜๋Š” ๊ฐ€์žฅ ์ข‹์€ ๋ฐฉ๋ฒ•์ด์ง€๋งŒ, ๊ทธ ๊ณผ์ •์ด ๋‹ค์†Œ ๋ฒˆ๊ฑฐ๋กญ๋‹ค๋Š” ๋‹จ์ ์ด ์žˆ์Šต๋‹ˆ๋‹ค.

์˜ˆ์‹œ)

column  = 'opinion_seas_risk' # ์ค‘์š”๋„๋ฅผ ์ธก์ •ํ•ด๋ณด๊ณ ์‹ถ์€ ํŠน์„ฑ

# opinion_h1n1_risk ์—†์ด fit
pipe = make_pipeline(
    OrdinalEncoder(), 
    SimpleImputer(), 
    RandomForestClassifier(n_estimators=100, random_state=2, n_jobs=-1)
)
pipe.fit(X_train.drop(columns=column), y_train)
score_without = pipe.score(X_val.drop(columns=column), y_val)
print(f'๊ฒ€์ฆ ์ •ํ™•๋„ ({column} ์ œ์™ธ): {score_without}')

# opinion_h1n1_risk ํฌํ•จ ํ›„ ๋‹ค์‹œ ํ•™์Šต
pipe = make_pipeline(
    OrdinalEncoder(), 
    SimpleImputer(), 
    RandomForestClassifier(n_estimators=100, random_state=2, n_jobs=-1)
)
pipe.fit(X_train, y_train)
score_with = pipe.score(X_val, y_val)
print(f'๊ฒ€์ฆ ์ •ํ™•๋„ ({column} ํฌํ•จ): {score_with}')

# opinion_h1n1_risk ํฌํ•จ ์ „ ํ›„ ์ •ํ™•๋„ ์ฐจ์ด๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค
print(f'{column}์˜ Drop-Column ์ค‘์š”๋„: {score_with - score_without}')

image

  1. Permutation Importance(Mean Decrease Accuracy ; MDA) ; ์ˆœ์—ด ์ค‘์š”๋„
  • ๊ธฐ๋ณธ ํŠน์„ฑ ์ค‘์š”๋„(MDI)์™€ Drop-Column ์ค‘์š”๋„์˜ ์ค‘๊ฐ„์— ์œ„์น˜ํ•œ๋‹ค๊ณ  ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ๊ด€์‹ฌ์žˆ๋Š” ํŠน์„ฑ์—๋งŒ ๋ฌด์ž‘์œ„๋กœ ๋…ธ์ด์ฆˆ๋ฅผ ์ฃผ๊ณ  ์˜ˆ์ธกํ•˜์˜€์„ ๋•Œ ์„ฑ๋Šฅ ํ‰๊ฐ€์ง€ํ‘œ(accuracy, F1, R2 ๋“ฑ)๊ฐ€ ๊ฐ์†Œํ•˜๋Š” ์ •๋„๋ฅผ ์ธก์ •ํ•ฉ๋‹ˆ๋‹ค.
  • Drop-Column ์ค‘์š”๋„์™€ ๋‹ค๋ฅด๊ฒŒ ๊ฒ€์ฆ๋ฐ์ดํ„ฐ์—์„œ ๊ฐ ํŠน์„ฑ์„ ์ œ๊ฑฐํ•˜์ง€ ์•Š๊ณ  ํŠน์„ฑ๊ฐ’์— ๋ฌด์ž‘์œ„๋กœ ๋…ธ์ด์ฆˆ๋ฅผ ์ฃผ์–ด ๊ธฐ์กด ์ •๋ณด๋ฅผ ์ œ๊ฑฐํ•˜๊ณ 
    ์„ฑ๋Šฅ์„ ์ธก์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋…ธ์ด์ฆˆ๋ฅผ ์ฃผ๋Š” ๊ฐ€์žฅ ๊ฐ„๋‹จํ•œ ๋ฐฉ๋ฒ•์€ ํ•ด๋‹น ํŠน์„ฑ๊ฐ’๋“ค์„ ์ƒ˜ํ”Œ ๋‚ด์—์„œ ์„ž๋Š” ๊ฒƒ(shuffle, permutation)์ž…๋‹ˆ๋‹ค.

์˜ˆ์‹œ1)

# ๋ณ€๊ฒฝ ํ•  ํŠน์„ฑ์„ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค
feature = 'opinion_seas_risk'
X_val[feature].head()

# ํŠน์„ฑ์˜ ๋ถ„ํฌ๋ฅผ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค
X_val[feature].value_counts()

# ํŠน์„ฑ์˜ ๊ฐ’์„ ๋ฌด์ž‘์œ„๋กœ ์„ž์Šต๋‹ˆ๋‹ค
X_val_permuted = X_val.copy()
X_val_permuted[feature] = np.random.RandomState(seed=7).permutation(X_val_permuted[feature])

# ํŠน์„ฑ ๊ฐ’์˜ ์ˆœ์„œ๊ฐ€ ๋’ค๋ฐ”๋€ ๊ฒƒ์„ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค
X_val_permuted[feature].head()

# ์นดํ…Œ๊ณ ๋ฆฌ๋“ค์˜ ๋ถ„ํฌ๋Š” ๋ฐ”๋€Œ์ง€๋Š” ์•Š์•˜์Œ์„ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค
X_val_permuted[feature].value_counts()

# ์ˆœ์—ด ์ค‘์š”๋„ ๊ฐ’์„ ์–ป์Šต๋‹ˆ๋‹ค. (์žฌํ•™์Šต์ด ํ•„์š” ์—†์Šต๋‹ˆ๋‹ค!)
score_permuted = pipe.score(X_val_permuted, y_val)

print(f'๊ฒ€์ฆ ์ •ํ™•๋„ ({feature}): {score_with}')
print(f'๊ฒ€์ฆ ์ •ํ™•๋„ (permuted "{feature}"): {score_permuted}')
print(f'์ˆœ์—ด ์ค‘์š”๋„: {score_with - score_permuted}')

image

์˜ˆ์‹œ2) eli5 ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ํ™œ์šฉ


! pip install eli5

import eli5
from eli5.sklearn import PermutationImportance

permuter = PermutationImportance(
            model, # ์—ฌ๊ธฐ์„œ model์€ ๋ฏธ๋ฆฌ train set์— ๋Œ€ํ•ด fit์ด ๋˜์–ด์žˆ์–ด์•ผํ•ฉ๋‹ˆ๋‹ค.
            scoring='accuracy', # ๋‹ค๋ฅธ score๋“ค๋„ ๊ฐ€๋Šฅ
            n_iter=5, # ๋‹ค๋ฅธ random seed๋ฅผ ์‚ฌ์šฉํ•ด์„œ 5๋ฒˆ ๋ฐ˜๋ณตํ•ฉ๋‹ˆ๋‹ค.
            random_state=2
            )

permuter.fit(X_val, y_val)

permuter.feature_importances_

์ถ”๊ฐ€) eli5.show_weights

# ํŠน์„ฑ๋ณ„ score ํ™•์ธ
eli5.show_weights(
    permuter, 
    top=None, # top n ์ง€์ • ๊ฐ€๋Šฅ, None ์ผ ๊ฒฝ์šฐ ๋ชจ๋“  ํŠน์„ฑ 
    feature_names=feature_names # list ํ˜•์‹์œผ๋กœ ๋„ฃ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค
)

image

Permutation Importance๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํŠน์„ฑ์„ ์„ ํƒํ•˜๋Š” ํŒ

minimum_importance = 0.001
mask = permuter.feature_importances_ > minimum_importance
features = X_train.columns[mask] # True์— ํ•ด๋‹นํ•˜๋Š” column๋งŒ ๊ฐ€์ ธ์˜จ๋‹ค.
X_train_selected = X_train[features]
X_val_selected = X_val[features]
  • ์œ„์™€ ๊ฐ™์ด ํŠน์ • ๊ฐ’ ์ด์ƒ์˜ ์ค‘์š”๋„๋ฅผ ๊ฐ€์ง„ ํŠน์„ฑ๋“ค๋งŒ ๋”ฐ๋กœ ์„ ํƒํ•˜์—ฌ ๋ชจ๋ธ์— ๋‹ค์‹œ fittingํ•˜์—ฌ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ์ผ๋ฐ˜์ ์œผ๋กœ importance๊ฐ€ ๋งค์šฐ ์ž‘์€ ํŠน์„ฑ๋“ค์€ ์ œ๊ฑฐํ•ด๋„ score์— ํฐ ์˜ํ–ฅ์ด ์—†์œผ๋ฉฐ ํŠน์„ฑ ๊ฐฏ์ˆ˜๊ฐ€ ์ ์œผ๋ฏ€๋กœ ๋” ํšจ์œจ์ ์ž…๋‹ˆ๋‹ค.

์ถ”๊ฐ€) ์ค‘์š”๋„์˜ ํ‘œ์ค€ํŽธ์ฐจ๊นŒ์ง€ ๊ณ ๋ คํ•˜๋Š” ๊ฒฝ์šฐ

permuter.feature_importances_ - permuter.feature_importances_std_ > 0 
# ์ˆœ์—ด ์ค‘์š”๋„์˜ ํ‰๊ท  ๊ฐ์†Œ๊ฐ’๊ณผ ๊ทธ ํ‘œ์ค€ํŽธ์ฐจ์˜ ์ฐจ๊ฐ€ ์–‘์ˆ˜์ธ ํŠน์ง•๋“ค์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
# ํ‘œ์ค€ํŽธ์ฐจ๊นŒ์ง€ ๊ณ ๋ คํ•ด์„œ ์–‘์ˆ˜, ์ฆ‰ ์–ธ์ œ๋‚˜ ์–‘์ˆ˜๊ฐ’
# ์œ„์˜ ์ค‘์š”๋„์—์„œ 8๊ฐœ๊ฐ€ feature importance๋Š” 0.001๋ณด๋‹ค ํฐ ๊ฒฝ์šฐ์ง€๋งŒ ํ‘œ์ค€ํŽธ์ฐจ๊นŒ์ง€ ๊ณ ๋ คํ•˜๋ฉด 7๊ฐœ๋งŒ ์–‘์ˆ˜์— ํ•ด๋‹นํ•ฉ๋‹ˆ๋‹ค. ('education_comp'๊ฐ€ ์ œ์™ธ๋จ)

Boosting

  • ๋ฐฐ๊น…(๋žœ๋คํฌ๋ ˆ์ŠคํŠธ)์˜ ๊ฒฝ์šฐ, ๋…๋ฆฝ์ ์ธ ์—ฌ๋Ÿฌ ํŠธ๋ฆฌ๋“ค์„ ๋งŒ๋“ค์ง€๋งŒ
    ๋ถ€์ŠคํŒ…์€ ๋งŒ๋“ค์–ด์ง€๋Š” ํŠธ๋ฆฌ๊ฐ€ ์ด์ „์— ๋งŒ๋“ค์–ด์ง„ ํŠธ๋ฆฌ์˜ ์˜ํ–ฅ์„ ๋ฐ›์Šต๋‹ˆ๋‹ค.
  • ๋ฐฐ๊น…(๋žœ๋คํฌ๋ ˆ์ŠคํŠธ)์˜ ์žฅ์ ์€ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ์— ์ƒ๋Œ€์ ์œผ๋กœ ๋œ ๋ฏผ๊ฐํ•œ ๊ฒƒ์ธ๋ฐ, ๋ถ€์ŠคํŒ…(๊ทธ๋ž˜๋””์–ธํŠธ ๋ถ€์ŠคํŒ…)์˜ ๊ฒฝ์šฐ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ
    ์„ธํŒ…์— ๋”ฐ๋ผ ๋ฐฐ๊น…๋ณด๋‹ค ๋” ์ข‹์€ ์˜ˆ์ธก ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ค„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ํŠธ๋ฆฌ ๊ธฐ๋ฐ˜ ๋ชจ๋ธ์€ non-linear, non-monotonic ๊ด€๊ณ„, ํŠน์„ฑ๊ฐ„ ์ƒํ˜ธ์ž‘์šฉ์ด ์กด์žฌํ•˜๋Š” ๋ฐ์ดํ„ฐ ํ•™์Šต์— ์ ์šฉํ•˜๊ธฐ ์ข‹์Šต๋‹ˆ๋‹ค.
  1. AdaBoost
  • ๋ชจ๋“  ์ƒ˜ํ”Œ์— ๋™์ผํ•œ ๊ฐ€์ค‘์น˜๋ฅผ ์‹œ์ž‘์œผ๋กœ ๊ฐ ํŠธ๋ฆฌ(weak learners)๊ฐ€ ๋งŒ๋“ค์–ด์งˆ ๋•Œ ์ž˜๋ชป ๋ถ„๋ฅ˜๋˜๋Š” ๊ด€์ธก์น˜(์ƒ˜ํ”Œ)์— ๊ฐ€์ค‘์น˜๋ฅผ ์ค๋‹ˆ๋‹ค.
  • ๋‹ค์Œ ํŠธ๋ฆฌ๊ฐ€ ๋งŒ๋“ค์–ด์งˆ ๋•Œ ์ด์ „ ํŠธ๋ฆฌ์—์„œ ์ž˜๋ชป ๋ถ„๋ฅ˜๋œ ์ƒ˜ํ”Œ์€ ๋” ๋งŽ์€ ๊ฐ€์ค‘์น˜๋ฅผ ๋ฐ›์•„ ๋” ๋งŽ์ด ์ƒ˜ํ”Œ๋ง๋˜์–ด ๊ทธ ์ƒ˜ํ”Œ์—
    ๋” ์ง‘์ค‘ํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

Step 0. ๋ชจ๋“  ๊ด€์ธก์น˜์— ๋Œ€ํ•ด ๊ฐ€์ค‘์น˜๋ฅผ ๋™์ผํ•˜๊ฒŒ ์„ค์ • ํ•ฉ๋‹ˆ๋‹ค.
Step 1. ๊ด€์ธก์น˜๋ฅผ ๋ณต์›์ถ”์ถœ ํ•˜์—ฌ ์•ฝํ•œ ํ•™์Šต๊ธฐ Dn์„ ํ•™์Šตํ•˜๊ณ  +, - ๋ถ„๋ฅ˜ ํ•ฉ๋‹ˆ๋‹ค.
Step 2. ์ž˜๋ชป ๋ถ„๋ฅ˜๋œ ๊ด€์ธก์น˜์— ๊ฐ€์ค‘์น˜๋ฅผ ๋ถ€์—ฌํ•ด ๋‹ค์Œ ๊ณผ์ •์—์„œ ์ƒ˜ํ”Œ๋ง์ด ์ž˜๋˜๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.
Step 3. Step 1~2 ๊ณผ์ •์„ nํšŒ ๋ฐ˜๋ณต(n = 3) ํ•ฉ๋‹ˆ๋‹ค.
Step 4. ๋ถ„๋ฅ˜๊ธฐ๋“ค(D1, D2, D3)์„ ๊ฒฐํ•ฉํ•˜์—ฌ ์ตœ์ข… ์˜ˆ์ธก์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

imageimageimage

  • ์ตœ์ข… ํ•™์Šต๊ธฐ(H(x))๋Š” ์•ฝํ•œ ํ•™์Šต๊ธฐ๋“ค(h_t)์˜ ๊ฐ€์ค‘(α)ํ•ฉ์œผ๋กœ ๋งŒ๋“ค์–ด์ง‘๋‹ˆ๋‹ค. α๋Š” Say(๊ฒฐ์ •๋ ฅ)์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. ๊ฒฐ์ •๋ ฅ์ด ํด์ˆ˜๋ก ๋ถ„๋ฅ˜๊ธฐ
    ์˜ ์„ฑ๋Šฅ์ด ์ข‹๋‹ค๋Š” ๋œป์ž…๋‹ˆ๋‹ค.

์ถ”๊ฐ€) AdaBoost๋Š” node 1๊ฐœ์™€ leaf 2๊ฐœ์ธ Stump๋ฅผ ๊ธฐ๋ณธ ๋ชจ๋ธ๋กœ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ์—ฌ๋Ÿฌ Stump์˜ ์กฐํ•ฉ์ž…๋‹ˆ๋‹ค.

  1. Gradient Boost
  • AdaBoost์™€ ์œ ์‚ฌํ•˜์ง€๋งŒ ๋น„์šฉํ•จ์ˆ˜(Loss function)์„ ์ตœ์ ํ™”ํ•˜๋Š” ๋ฐฉ๋ฒ•์— ์žˆ์–ด ์ฐจ์ด๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.
  • AdaBoost๊ฐ€ ์ƒ˜ํ”Œ์˜ ๊ฐ€์ค‘์น˜๋ฅผ ๋™์ผํ•˜๊ฒŒ ์ฃผ๊ณ  ์กฐ์ •ํ•˜๋Š” ๋ฐฉ๋ฒ• ๋Œ€์‹  Gradient Boost์—์„œ๋Š” ์ž”์ฐจ(residual)์„ ํ•™์Šตํ•˜๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.
    ์ž”์ฐจ๊ฐ€ ๋” ํฐ ๋ฐ์ดํ„ฐ๋ฅผ ๋” ํ•™์Šตํ•˜๋„๋ก ๋งŒ๋“œ๋Š” ํšจ๊ณผ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ํšŒ๊ท€์™€ ๋ถ„๋ฅ˜ ๋ฌธ์ œ ๋ชจ๋‘ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

image

๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ

  • ๊ธฐ๋ณธ์ ์œผ๋กœ sklearn.ensemble์— AdaBoost์™€ GradientBoost๊ฐ€ ๊ตฌํ˜„๋˜์–ด์žˆ์ง€๋งŒ ๋ถ€์ŠคํŒ…์€ ๋ณดํ†ต ๋‹ค๋ฅธ ๋” ์ข‹์€ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค.
  1. XGBoost : ๊ฒฐ์ธก๊ฐ’์„ ์ˆ˜์šฉ, monotonic constrains๋ฅผ ๊ฐ•์ œํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
import xgboost
from xgboost import XGBClassifier
  1. LightGBM : ๊ฒฐ์ธก๊ฐ’์„ ์ˆ˜์šฉ, monotonic constrains๋ฅผ ๊ฐ•์ œํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
import lightgbm
from lightgbm import LGBMClassifier
  1. CatBoost : ๊ฒฐ์ธก๊ฐ’์„ ์ˆ˜์šฉ, categorical features๋ฅผ ์ „์ฒ˜๋ฆฌ ์—†์ด ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
!pip install catboost # ๊ตฌ๊ธ€ ์ฝ”๋žฉ ์กฐ๊ฑด์—์„œ ๋”ฐ๋กœ ์„ค์น˜๋ฅผ ํ•ด์ฃผ์–ด์•ผํ•ฉ๋‹ˆ๋‹ค.

import catboost
from catboost import CatBoostClassifier
์ฐธ๊ณ  : monotonic constrains

image

  • monotonic constraints ํšจ๊ณผ, ๋‹จ์กฐ์ฆ๊ฐ€ํ•ด์•ผ ํ•˜๋Š” ํŠน์„ฑ์ด ์˜ค๋ฅ˜๋กœ ๋น„๋‹จ์กฐ ์ฆ๊ฐ€ํ• ๋•Œ ๋ณ€์ˆ˜๋งˆ๋‹ค ์ ์šฉ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.
  • ๊ฐ’์ด ์ž‘์€ ๋ถ€๋ถ„์— ๋Œ€ํ•ด์„œ ๋ฐ์ดํ„ฐ๊ฐ€ ์—†์–ด์„œ ๊ฐ์†Œํ•˜๋Š” ๊ฑฐ์ฒ˜๋Ÿผ ๋‚˜์˜ค์ง€๋งŒ ์šฐ๋ฆฌ๊ฐ€ ์ด ํŠน์„ฑ์€ ๋‹จ์กฐ์ฆ๊ฐ€(์šฐ์ƒํ–ฅ)ํ•˜๋Š”
    ๊ฑธ ์•Œ๊ณ  ์žˆ๋‹ค๋ฉด ์ด ๋ถ€๋ถ„์„ ๋ณด์ •ํ•ด์ค„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Early Stopping

  • ๋ฐฐ๊น…๊ณผ ๋‹ค๋ฅด๊ฒŒ ๋ถ€์ŠคํŒ…์€ ๊ทธ ์•ˆ์— ๊ธฐ๋ณธ๋ชจ๋ธ์ธ tree๋ฅผ ์ˆœ์ฐจ์ ์œผ๋กœ ํ•™์Šตํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ๋ชจ๋“  n_estimators(tree model์˜ ์ˆ˜)
    ๋ฅผ ํ•™์Šตํ•  ๊ฒƒ ์—†์ด ์ผ์ • ๊ธฐ์ค€์น˜๊นŒ์ง€๋งŒ ์ฑ„์šฐ๋ฉด ํ•™์Šตํ•˜์ง€ ์•Š๊ฒŒ ์กฐ์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • GridSearchCV, RandomizedSearchCV ํ˜น์€ ๋ฐ˜๋ณต๋ฌธ์œผ๋กœ n_estimators์˜ ์ตœ์ ์˜ ๊ฐ’์„ ์ฐพ์œผ๋ ค๋ฉด ๋„ˆ๋ฌด ๋งŽ์€ ๋ฐ˜๋ณต์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
    ๋˜ํ•œ, ๋‹ค๋ฅธ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ์™€์˜ ์กฐํ•ฉ๊นŒ์ง€ ๊ฒฝ์šฐ์˜ ์ˆ˜๋ฅผ ์ƒ๊ฐํ•˜๋ฉด ํ•™์Šต ํšŸ์ˆ˜๊ฐ€ ๋น„์•ฝ์ ์œผ๋กœ ์ฆ๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.
    ์ด๋Ÿด ๋•Œ, ๋ถ€์ŠคํŒ…์˜ ๊ฒฝ์šฐ Early Stopping์„ ํ™œ์šฉํ•˜์—ฌ ์•„์ฃผ ํšจ๊ณผ์ ์œผ๋กœ n_estimators๋ฅผ ์ตœ์ ํ™”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์˜ˆ์‹œ)

encoder = OrdinalEncoder()
X_train_encoded = encoder.fit_transform(X_train) # ํ•™์Šต๋ฐ์ดํ„ฐ
X_val_encoded = encoder.transform(X_val) # ๊ฒ€์ฆ๋ฐ์ดํ„ฐ

model = XGBClassifier(
    n_estimators=1000,  # <= 1000 ํŠธ๋ฆฌ๋กœ ์„ค์ •ํ–ˆ์ง€๋งŒ, early stopping ์— ๋”ฐ๋ผ ์กฐ์ ˆ๋ฉ๋‹ˆ๋‹ค.
    max_depth=7,        # default=3, high cardinality ํŠน์„ฑ์„ ์œ„ํ•ด ๊ธฐ๋ณธ๋ณด๋‹ค ๋†’์—ฌ ๋ณด์•˜์Šต๋‹ˆ๋‹ค.
    learning_rate=0.2,
#     scale_pos_weight=ratio, # imbalance ๋ฐ์ดํ„ฐ ์ผ ๊ฒฝ์šฐ ๋น„์œจ์„ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค.
    n_jobs=-1
)

eval_set = [(X_train_encoded, y_train), 
            (X_val_encoded, y_val)]

model.fit(X_train_encoded, y_train, 
          eval_set=eval_set,
          eval_metric='error', # #(wrong cases)/#(all cases)
          early_stopping_rounds=50
         ) # 50 rounds ๋™์•ˆ ์Šค์ฝ”์–ด์˜ ๊ฐœ์„ ์ด ์—†์œผ๋ฉด ๋ฉˆ์ถค # ์Šค์ฝ”์–ด๊ฐ€ ์ตœ๊ณ ์ ์„ ์ฐ๊ณ  ์ดํ›„ 50๊ฐœ๋ฅผ ๋” ํ•ด๋ณด๋Š” ๊ฒƒ
         # ์•„๋ž˜ ๊ฒฐ๊ณผ์—์„œ validation_0_error๊ฐ€ train set์— ๋Œ€ํ•œ ๊ฒƒ/ validation_1_error๊ฐ€ val set์— ๋Œ€ํ•œ ๊ฒƒ
         # ์ตœ์ ์˜ n_estimators๋กœ model์ด fitting ๋ฉ๋‹ˆ๋‹ค.

์ฐธ๊ณ )

์œ„์˜ scale_pos_weight๋Š” ์•„๋ž˜์™€ ๊ฐ™์ด ์ค„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

image

  • 1์— ํ•ด๋‹นํ•˜๋Š” ์ƒ˜ํ”Œ์— ๊ฐ€์ค‘์„ ์ฃผ๋Š” ๊ฒƒ์ด๋‹ˆ 'ํƒ€๊ฒŸ๊ฐ’(0) ๊ฐฏ์ˆ˜ / ํƒ€๊ฒŸ๊ฐ’(1) ๊ฐฏ์ˆ˜' ๋ฅผ positive์— ๊ณฑํ•˜๋ฉด ๋‘˜์˜ ๋น„์œจ์ด 1:1์ด ๋ฉ๋‹ˆ๋‹ค.

์ฐธ๊ณ 2)

๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ผ์ • ์ˆ˜(์˜ˆ์‹œ์—์„  35) ์ดํ›„๋กœ๋Š” ๊ฒ€์ฆ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ error๊ฐ€ ๋” ๋–จ์–ด์ง€์ง„ ์•Š๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

results = model.evals_result()
train_error = results['validation_0']['error']
val_error = results['validation_1']['error']

epoch = range(1, len(train_error)+1)
plt.plot(epoch, train_error, label='Train')
plt.plot(epoch, val_error, label='Validation')
plt.ylabel('Classification Error')
plt.xlabel('Model Complexity (n_estimators)')
plt.ylim((0.15, 0.25)) # Zoom in
plt.legend();

image

์ฐธ๊ณ 

ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹

Random Forest

  • max_depth (๋†’์€๊ฐ’์—์„œ ๊ฐ์†Œ์‹œํ‚ค๋ฉฐ ํŠœ๋‹, ๋„ˆ๋ฌด ๊นŠ์–ด์ง€๋ฉด ๊ณผ์ ํ•ฉ)
  • n_estimators (์ ์„๊ฒฝ์šฐ ๊ณผ์†Œ์ ํ•ฉ, ๋†’์„๊ฒฝ์šฐ ๊ธด ํ•™์Šต์‹œ๊ฐ„)
  • min_samples_leaf (๊ณผ์ ํ•ฉ์ผ๊ฒฝ์šฐ ๋†’์ž„)
  • max_features (์ค„์ผ ์ˆ˜๋ก ๋‹ค์–‘ํ•œ ํŠธ๋ฆฌ์ƒ์„ฑ, ๋†’์ด๋ฉด ๊ฐ™์€ ํŠน์„ฑ์„ ์‚ฌ์šฉํ•˜๋Š” ํŠธ๋ฆฌ๊ฐ€ ๋งŽ์•„์ ธ ๋‹ค์–‘์„ฑ์ด ๊ฐ์†Œ)
  • class_weight (imbalanced ํด๋ž˜์Šค์ธ ๊ฒฝ์šฐ ์‹œ๋„)

XGBoost

  • learning_rate (๋†’์„๊ฒฝ์šฐ ๊ณผ์ ํ•ฉ ์œ„ํ—˜์ด ์žˆ์Šต๋‹ˆ๋‹ค)
  • max_depth (๋‚ฎ์€๊ฐ’์—์„œ ์ฆ๊ฐ€์‹œํ‚ค๋ฉฐ ํŠœ๋‹, ๋„ˆ๋ฌด ๊นŠ์–ด์ง€๋ฉด ๊ณผ์ ํ•ฉ์œ„ํ—˜, -1 ์„ค์ •์‹œ ์ œํ•œ ์—†์ด ๋ถ„๊ธฐ, ํŠน์„ฑ์ด ๋งŽ์„ ์ˆ˜๋ก ๊นŠ๊ฒŒ ์„ค์ •)
  • n_estimators (๋„ˆ๋ฌด ํฌ๊ฒŒ ์ฃผ๋ฉด ๊ธด ํ•™์Šต์‹œ๊ฐ„, early_stopping_rounds์™€ ๊ฐ™์ด ์‚ฌ์šฉ)
  • scale_pos_weight (imbalanced ๋ฌธ์ œ์ธ ๊ฒฝ์šฐ ์ ์šฉ์‹œ๋„)