Jayden1116
Jayden`s LifeTrip ๐Ÿ”†
Jayden1116
์ „์ฒด ๋ฐฉ๋ฌธ์ž
์˜ค๋Š˜
์–ด์ œ
  • Jayden`s (481)
    • ๐Ÿฏ Hello, Jayden (144)
      • ์ผ๊ธฐ (1)
      • ์‹ ๋ฌธ (121)
      • ์Œ์•… (6)
      • ๊ฒฝ์ œ (16)
    • ๐Ÿ’› JavaScript (88)
      • ์ด๋ชจ์ €๋ชจ (4)
      • ๋ฐฑ์ค€ (44)
      • ํ”„๋กœ๊ทธ๋ž˜๋จธ์Šค (40)
      • ๋ฒ„๊ทธ (0)
    • ๐ŸŽญ HTML CSS (6)
      • ํํŠธ๋ฏ€๋ฅด (2)
      • ํฌ์Šค์Šค (4)
    • ๐Ÿ’ป CS (13)
      • ์ž๋ฃŒ๊ตฌ์กฐ ๋ฐ ์•Œ๊ณ ๋ฆฌ์ฆ˜ (1)
      • ๋„คํŠธ์›Œํฌ (9)
      • ์šด์˜์ฒด์ œ (1)
      • ๋ฐ์ดํ„ฐ ๋ฒ ์ด์Šค (0)
      • ๋””์ž์ธ ํŒจํ„ด (1)
    • ๐Ÿ Python (71)
      • ๋ฐฑ์ค€ (67)
      • ํ”„๋กœ๊ทธ๋ž˜๋จธ์Šค (4)
    • ๐Ÿ’ฟ Data (156)
      • ์ด๋ชจ์ €๋ชจ (65)
      • ๋ถ€ํŠธ์บ ํ”„ (89)
      • ๊ทธ๋กœ์Šค ํ•ดํ‚น (2)

๋ธ”๋กœ๊ทธ ๋ฉ”๋‰ด

  • ๐Ÿ”ด ๋ธ”๋กœ๊ทธ(ํ™ˆ)
  • ๐Ÿฑ Github
  • ๊ธ€์“ฐ๊ธฐ
  • ํŽธ์ง‘
hELLO ยท Designed By JSW.
Jayden1116

Jayden`s LifeTrip ๐Ÿ”†

๐Ÿ’ฟ Data/๋ถ€ํŠธ์บ ํ”„

[TIL]26.Logistic Regression(๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€; ๋ถ„๋ฅ˜)

2021. 12. 22. 23:05

๋ชฉํ‘œ

  • ํ›ˆ๋ จ/๊ฒ€์ฆ/ํ…Œ์ŠคํŠธ(train/validate/test) ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์ดํ•ด
  • ๋ถ„๋ฅ˜(Classification)์™€ ํšŒ๊ท€(Regression)์˜ ์ฐจ์ด์ ์„ ํŒŒ์•…ํ•˜๊ณ  ๋ฌธ์ œ์— ๋งž๋Š” ๋ชจ๋ธ ์‚ฌ์šฉ
  • ๋กœ์ง€์Šคํ‹ฑํšŒ๊ท€(Logistic Regression)์— ๋Œ€ํ•œ ์ดํ•ด

Train/Validate/Test data

Kaggle 'Titanic: Machine Learning from Disaster' ์˜ˆ์‹œ

import pandas as pd
train = pd.read_csv('https://ds-lecture-data.s3.ap-northeast-2.amazonaws.com/titanic/train.csv')
test = pd.read_csv('https://ds-lecture-data.s3.ap-northeast-2.amazonaws.com/titanic/test.csv')

print("train features: ", train.shape[1])
print("test features: ", test.shape[1])

train features: 12
test features: 11

์ฆ‰, test data๋Š” target(label, ์šฐ๋ฆฌ๊ฐ€ ์•Œ๊ณ  ์‹ถ์€ ์ •๋‹ต) ๋ณ€์ˆ˜๊ฐ€ ํ•˜๋‚˜ ์ ์Šต๋‹ˆ๋‹ค.

print("target col: ", train.columns.difference(test.columns)[0]) # ์–ด๋–ค ํŠน์„ฑ์ด ๋น ์ ธ์žˆ๋Š”์ง€ ํ™•์ธ

target col: Survived

์ผ€๊ธ€์—์„œ๋Š” ๋ณดํ†ต train set ๊ณผ test set์„ ๋‚˜๋ˆ ์„œ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฐ๋ฐ test set์—๋Š” target์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ์ œ์™ธํ•ด์„œ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

์ผ๋‹จ ๋ฌด์ž‘์ • ์‚ฌ๋žŒ๋“ค์ด test set์„ ํ†ตํ•ด ์ ์ˆ˜๋ฅผ ์˜ฌ๋ฆฌ๋Š” ๊ฒƒ์„ ๋ง‰๋Š” ๊ฒƒ๋„ ์žˆ์ง€๋งŒ, ๊ฐ€์žฅ ํฐ ์ด์œ ๋Š” ๋ชจ๋ธ์˜ ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ์„ ์˜ฌ๋ฐ”๋ฅด๊ฒŒ ์ธก์ •ํ•˜๊ธฐ ์œ„ํ•˜์ž…๋‹ˆ๋‹ค.(์‚ฌ์‹ค ๋น„์Šทํ•œ ์ด์œ )

๊ทธ๋ ‡๊ธฐ์— ์šฐ๋ฆฌ๋Š” train set์˜ ์ผ๋ถ€๋ฅผ validate set์œผ๋กœ ๋‚˜๋ˆ„์–ด ์šฐ๋ฆฌ๊ฐ€ ๋งŒ๋“  ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ๋”ฐ๋กœ ๊ฒ€์ฆํ•ด์•ผํ•ฉ๋‹ˆ๋‹ค.

์™œ validate set์ด ํ•„์š”ํ• ๊นŒ?

train set์œผ๋กœ๋งŒ ํ•œ๋ฒˆ์— ์™„์ „ํ•˜๊ฒŒ ํ•™์Šต์‹œํ‚ค๋Š” ๊ฒƒ์€ ๋ถˆ์™„์ „ํ•˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. train set์„ ํ†ตํ•ด ๋‹ค๋ฅด๊ฒŒ ํŠœ๋‹๋œ ์—ฌ๋Ÿฌ ๋ชจ๋ธ๋“ค์„ ํ•™์Šตํ•œ ํ›„ ์–ด๋–ค ๋ชจ๋ธ์ด ํ•™์Šต์ด ์ž˜ ๋˜์—ˆ๋Š”์ง€ validate set์„ ์‚ฌ์šฉํ•˜์—ฌ ์„ ํƒํ•˜๋Š” ๊ณผ์ •์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

์ฆ‰, train set์œผ๋กœ ์—ฌ๋Ÿฌ ๋ชจ๋ธ๋“ค์„ ํ•™์Šต(๋ชจ๋ธ์— ์ง์ ‘ ๊ด€์—ฌ)ํ•˜๊ณ  validate set์œผ๋กœ ๋ชจ๋ธ์„ ์„ ํƒํ•˜๊ณ , ์ข€๋” ์กฐ์ •์„ ํ•œ ํ›„(๋ชจ๋ธ์— ๊ฐ„์ ‘ ๊ด€์—ฌ) test set์— ๋Œ€ํ•ด์„œ ๋‹จ ํ•œ๋ฒˆ ๊ฒฐ๊ณผ๊ฐ’์„ ์‹œํ—˜ํ•ด๋ด…๋‹ˆ๋‹ค.

cf)๋งŒ์•ฝ test set์œผ๋กœ ๋ชจ๋ธ์„ ์„ ํƒํ•˜๊ณ  ์กฐ์ ˆํ•œ๋‹ค๋ฉด, ๊ทธ๊ฑฐ ์ž์ฒด๊ฐ€ ์ด๋ฏธ test set์— ํŠนํ™”๋œ ๋ชจ๋ธ์„ ๊ณ ๋ฅด๋Š” ๊ฒƒ์ด๊ธฐ์— ๊ณผ์ ํ•ฉ์— ๊ธฐ์—ฌํ•˜๊ฒŒ ๋˜๋Š” ๊ฑฐ๊ณ  ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ์ด ๋–จ์–ด์ง€๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

  • train data : ๋ชจ๋ธ์„ fitํ•˜๋Š” ์šฉ๋„ == ๋ฌธ์ œ์ง‘ ํ•™์Šต
  • validate data : ์˜ˆ์ธก ๋ชจ๋ธ์„ ์„ ํƒํ•˜๊ธฐ ์œ„ํ•ด ์˜ค๋ฅ˜๋ฅผ ์ธก์ •ํ•˜๋Š” ์šฉ๋„ == ๋ชจ์˜๊ณ ์‚ฌ
  • test data : ์ผ๋ฐ˜ํ™” ์˜ค๋ฅ˜๋ฅผ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•จ์œผ๋กœ ์„ ํƒ๋œ ๋ชจ๋ธ์— ๋Œ€ํ•ด ๋งˆ์ง€๋ง‰์— ํ•œ ๋ฒˆ ์‚ฌ์šฉ, ํ›ˆ๋ จ์ด๋‚˜ ๊ฒ€์ฆ๊ณผ์ •์—์„œ ์ ˆ๋Œ€ ์‚ฌ์šฉX == ์ˆ˜๋Šฅ

image

  • ํ•™์Šต ๋ชจ๋ธ์„ ๊ฐœ๋ฐœ ์‹œ, ๋ชจ๋ธ ์„ ํƒ(Model Selection) ๊ณผ์ •์ด ํ•„์ˆ˜
  • ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ(Hyper parameter) ํŠœ๋‹์„ ํ•˜๊ฒŒ ๋˜๋Š”๋ฐ ํŠœ๋‹์˜ ํšจ๊ณผ๋ฅผ ํ™•์ธํ•˜๊ธฐ ์œ„ํ•ด val data(๊ฒ€์ฆ ๋ฐ์ดํ„ฐ)๊ฐ€ ํ•„์š”
  • test data๋กœ ์ ˆ๋Œ€ ํŠœ๋‹์„ ํ•˜๋ฉด ์•ˆ๋จ
  • ๋ฐ์ดํ„ฐ๊ฐ€ ๋งŽ์€ ๊ฒฝ์šฐ์—” train/val/test ์œผ๋กœ ๋‚˜๋ˆ ์„œ ์‚ฌ์šฉํ•˜๋ฉด ๋˜์ง€๋งŒ, ์ƒ๋Œ€์ ์œผ๋กœ ๋ฐ์ดํ„ฐ ์ˆ˜๊ฐ€ ์ ์„ ๊ฒฝ์šฐ K-fold cross validation์„ ์ง„ํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.(๋‹จ, ์ด ๋•Œ๋„ test set๋Š” ๋ฏธ๋ฆฌ ๋”ฐ๋กœ ๋‘์–ด์•ผํ•ฉ๋‹ˆ๋‹ค.)
from sklearn.model_selection import train_test_split

train, test = train_test_split(df, test_size=0.2, random_state=2) # random_state๋Š” ์„ ํƒ
train, validation = train_test_split(train, test_size = 0.2, random_state=2) # ๋‚˜๋ˆ„์–ด์ง„ train์—์„œ ํ•œ๋ฒˆ ๋” ๋‚˜๋ˆ ์„œ valdation ์ƒ์„ฑ

# ์ดํ›„ features์™€ target์„ ์ •ํ•ด์ฃผ๊ณ  X,  y ๋กœ ๋‚˜๋ˆˆ๋‹ค.

features = 
target =

X_train = train[features]
X_val = validation[features]
X_test = test[features]

y_train = train[target]
y_val = validation[target]
y_test = test[target]

๋˜๋Š”

features =
target =

X = df[features]
y = df[target]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=2)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.2, random_state=2)

๋ถ„๋ฅ˜(Classification)

  • ๋ถ„๋ฅ˜๋ฌธ์ œ๋Š” ํšŒ๊ท€๋ฌธ์ œ์™€ ๋‹ค๋ฅธ ๊ธฐ์ค€์œผ๋กœ ๊ธฐ์ค€๋ชจ๋ธ์„ ์„ ์ •ํ•ฉ๋‹ˆ๋‹ค.
    • ํšŒ๊ท€๋ฌธ์ œ : ๋ณดํ†ต ํƒ€๊ฒŸ ๋ณ€์ˆ˜์˜ ํ‰๊ท ๊ฐ’์„ ๊ธฐ์ค€๋ชจ๋ธ๋กœ
    • ๋ถ„๋ฅ˜๋ฌธ์ œ : ๋ณดํ†ต ํƒ€๊ฒŸ ๋ณ€์ˆ˜์˜ ์ตœ๋นˆ๊ฐ’์„ ๊ธฐ์ค€๋ชจ๋ธ๋กœ
    • ์‹œ๊ณ„์—ด(time-series) : ๋ณดํ†ต ์–ด๋–ค ์‹œ์ ์„ ๊ธฐ์ค€์œผ๋กœ ์ด์ „ ์‹œ๊ฐ„์˜ ๋ฐ์ดํ„ฐ๋ฅผ ๊ธฐ์ค€๋ชจ๋ธ๋กœ

๋ถ„๋ฅ˜ ๋ฌธ์ œ์—์„œ ์ตœ๋นˆ๊ฐ’์„ ๊ธฐ์ค€๋ชจ๋ธ๋กœ ๋‘๋Š” ์ด์œ ?

ํด๋ž˜์Šค 1๊ณผ 0์„ ๋น„์œจ์ด 9:1์ธ ํ•™์Šต ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ง€๊ณ  ๋ชจ๋ธ์„ ๋งŒ๋“ ๋‹ค๋ฉด ๋ชจ๋ธ ๊ฒฐ๊ณผ๊ฐ’์„ 1๋กœ ๊ณ ์ •ํ•ด๋„ ์ •ํ™•๋„๊ฐ€ 90%๊ฐ€ ๋‚˜์˜ค๊ฒŒ ๋œ๋‹ค. ๊ทธ๋Ÿผ ์ด ๋ชจ๋ธ์€ ์ •ํ™•๋„๊ฐ€ 90% ์ด์ƒ์ด๋‹ˆ ์˜๋ฏธ๊ฐ€ ์žˆ๋Š” ๋ชจ๋ธ์ผ๊นŒ? X
๋”ฐ๋ผ์„œ, ์šฐ๋ฆฌ๋Š” ๋ชจ๋‘ 1๋กœ ๋‘ฌ๋„ 90%๊ฐ€ ๋‚˜์˜ค๋‹ˆ ์ด 90%๋ณด๋‹ค ๋†’์€ ๊ฐ’์„ ๊ฐ–๋Š” ๋ชจ๋ธ์„ ๋งŒ๋“ค์–ด์•ผ '์œ ์˜๋ฏธ'ํ•˜๋‹ค๊ณ  ํŒ๋‹จํ•  ์ˆ˜ ์žˆ๋‹ค.(๋ง ๊ทธ๋Œ€๋กœ ๊ธฐ์ค€์ด ๋˜๋Š” ๊ฒƒ์ด๋‹ค.)

# ์ตœ๋นˆ๊ธฐ์ค€๋ชจ๋ธ ๋งŒ๋“œ๋Š” ์˜ˆ์‹œ

target = 'Survived' # ํƒ€์ดํƒ€๋‹‰ ์…‹์œผ๋กœ ์˜ˆ์‹œ

y_train = train[target]
y_tain.value_counts(normalize=True) # 0๊ณผ 1 ์ค‘ ์ตœ๋นˆ๊ฐ’์ด ์–ด๋–ค๊ฑด์ง€ ์•Œ๊ธฐ ์œ„ํ•จ

0 0.625749
1 0.374251

# mode(): Return the highest frequency value in a Series.
major = y_train.mode()[0]

# ํƒ€๊ฒŸ ์ƒ˜ํ”Œ ์ˆ˜ ๋งŒํผ 0์ด ๋‹ด๊ธด ๋ฆฌ์ŠคํŠธ๋ฅผ ๋งŒ๋“ญ๋‹ˆ๋‹ค. ๊ธฐ์ค€๋ชจ๋ธ๋กœ ์˜ˆ์ธก
y_pred = [major] * len(y_train)

๋ถ„๋ฅ˜์—์„œ์˜ ํ‰๊ฐ€์ง€ํ‘œ(ํšŒ๊ท€์™€๋Š” ๋‹ค๋ฅธ ํ‰๊ฐ€์ง€ํ‘œ ์‚ฌ์šฉ)

  • ์ ˆ๋Œ€๋กœ ํšŒ๊ท€ํ‰๊ฐ€์ง€ํ‘œ๋ฅผ ๋ถ„๋ฅ˜ ๋ชจ๋ธ์— ์‚ฌ์šฉํ•ด์„  ์•ˆ๋ฉ๋‹ˆ๋‹ค. ๊ทธ ๋ฐ˜๋Œ€๋„ ๋งˆ์ฐฌ๊ฐ€์ง€
  • ๋ถ„๋ฅ˜๋ฌธ์ œ์—์„œ์˜ ์‚ฌ์šฉํ•˜๋Š” ํ‰๊ฐ€์ง€ํ‘œ : ์ •ํ™•๋„(Accuracy)

$$Accuracy = \frac{์˜ฌ๋ฐ”๋ฅด๊ฒŒ ์˜ˆ์ธกํ•œ ์ˆ˜} {์ „์ฒด ์˜ˆ์ธก ์ˆ˜} = \frac{TP + TN} {P + N}$$

# ์ด๋Ÿฐ์‹์œผ๋กœ accuracy ๊ตฌํ•  ์ˆ˜ ์žˆ์Œ

from sklearn.metrics import accuracy_score
print("training accuracy: ", accuracy_score(y_train, y_pred))

๋กœ์ง์Šคํ‹ฑ ํšŒ๊ท€(Logistic Regression)

๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€๋ชจ๋ธ

$$\large P(X)={\frac {1}{1+e^{-(\beta_{0}+\beta_{1}X_{1}+\cdots +\beta_{p}X_{p})}}}$$

$$ 0 \leq P(X) \leq 1$$

image

๋กœ์ง€์Šคํ‹ฑํšŒ๊ท€๋Š” ํŠน์„ฑ๋ณ€์ˆ˜๋ฅผ ๋กœ์ง€์Šคํ‹ฑ ํ•จ์ˆ˜ ํ˜•ํƒœ๋กœ ํ‘œํ˜„ํ•ฉ๋‹ˆ๋‹ค.(์œ„์—์„œ 0.5๋ฅผ ๊ธฐ์ค€์œผ๋กœ 0 ๊ณผ 1๋กœ ๋‚˜๋‰ฉ๋‹ˆ๋‹ค.)
๊ฒฐ๊ณผ์ ์œผ๋กœ ๊ด€์ธก์น˜๊ฐ€ ํŠน์ • ํด๋ž˜์Šค์— ์†ํ•  ํ™•๋ฅ ๊ฐ’์œผ๋กœ ๊ณ„์‚ฐ์ด ๋ฉ๋‹ˆ๋‹ค. ๋ถ„๋ฅ˜๋ฌธ์ œ์—์„œ๋Š” ํ™•๋ฅ ๊ฐ’์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ถ„๋ฅ˜๋ฅผ ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

Logit Transformation
  • ๋กœ์ง€์Šคํ‹ฑํšŒ๊ท€ ๊ณ„์ˆ˜๋Š” ๋น„์„ ํ˜• ํ•จ์ˆ˜ ๋‚ด์— ์žˆ์–ด ์ง๊ด€์ ์œผ๋กœ ํ•ด์„ํ•˜๊ธฐ๊ฐ€ ์–ด๋ ต์Šต๋‹ˆ๋‹ค.
  • ์ด ๋•Œ, ์˜ค์ฆˆ(Odds) ๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ์„ ํ˜•๊ฒฐํ•ฉ ํ˜•ํƒœ๋กœ ๋ณ€ํ™˜์ด ๊ฐ€๋Šฅํ•ด์„œ ๋ณด๋‹ค ์‰ฝ๊ฒŒ ํ•ด์„ ๊ฐ€๋Šฅ
  • ์˜ค์ฆˆ(Odds) : ์‹คํŒจํ™•๋ฅ ์— ๋Œ€ํ•œ ์„ฑ๊ณตํ™•๋ฅ ์˜ ๋น„ == ์„ฑ๊ณตํ™•๋ฅ /์‹คํŒจํ™•๋ฅ  == ํด๋ž˜์Šค1์— ์†ํ•  ํ™•๋ฅ  / ํด๋ž˜์Šค0์— ์†ํ•  ํ™•๋ฅ 

$$Odds = \large \frac{p}{1-p}$$

p = ์„ฑ๊ณตํ™•๋ฅ , 1-p = ์‹คํŒจํ™•๋ฅ 

$$p = 1 ์ผ๋•Œ odds = \infty ,p = 0 ์ผ๋•Œ odds = 0$$

$$\large ln(Odds) = ln(\frac{p}{1-p}) = ln(\frac{\frac {1}{1+e^{-(\beta_{0}+\beta_{1}X_{1}+\cdots +\beta_{p}X_{p})}}}{1 - \frac {1}{1+e^{-(\beta_{0}+\beta_{1}X_{1}+\cdots +\beta_{p}X_{p})}}}) = \normalsize \beta_{0}+\beta_{1}X_{1}+\cdots +\beta_{p}X_{p}$$

  • ๋กœ์ง“๋ณ€ํ™˜(Logit Transformation) : ์˜ค์ฆˆ์— ๋กœ๊ทธ๋ฅผ ์ทจํ•ด ๋ณ€ํ™˜ํ•˜๋Š” ๊ฒƒ
  • ์ด๋ฅผ ํ†ตํ•ด ๋น„์„ ํ˜•ํ˜•ํƒœ์ธ ๋กœ์ง€์Šคํ‹ฑํ•จ์ˆ˜ํ˜•ํƒœ๋ฅผ ์„ ํ˜•ํ˜•ํƒœ๋กœ ๋งŒ๋“ค์–ด ํšŒ๊ท€๊ณ„์ˆ˜์˜ ์˜๋ฏธ๋ฅผ ์ข€๋” ์‰ฝ๊ฒŒ ํ•ด์„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ํŠน์„ฑ X์˜ ์ฆ๊ฐ€์— ๋”ฐ๋ผ ๋กœ์ง“(ln(Odds))๊ฐ€ ์–ผ๋งˆ๋‚˜ ์ฆ๊ฐํ–ˆ๋‹ค๊ณ  ํ•ด์„ ๊ฐ€๋Šฅ ; X(ํŠน์„ฑ) 1๋‹จ์œ„ ์ฆ๊ฐ€ ๋‹น Odds๋Š” exp(๊ณ„์ˆ˜) ๋งŒํผ ์ฆ๊ฐ€ํ•œ๋‹ค ํ•ด์„ ๊ฐ€๋Šฅ)

๊ธฐ์กด ๋กœ์ง€์Šคํ‹ฑํ˜•ํƒœ์˜ y ๊ฐ’์€ 0 ~ 1์˜ ๋ฒ”์œ„๋ฅผ ๊ฐ€์ง„๋‹ค๋ฉด ๋กœ์ง“์€ -โˆž ~ โˆž ์˜ ๋ฒ”์œ„๋ฅผ ๊ฐ–๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€ ์ถ”๊ฐ€์‚ฌํ•ญ

# ๋‹ค์Œ๊ณผ ๊ฐ™์ด fitting

from sklearn.linear_model import LogisticRegression

logistic = LogisticRegression()
logistic.fit(X_train, y_train)

logistic.score(X_train, y_train) # ์ด๋ ‡๊ฒŒ ํ•˜๋ฉด train set์— ๋Œ€ํ•œ ์ •ํ™•๋„๋ฅผ ๊ตฌํ•  ์ˆ˜ ์žˆ์Œ
logistic.score(X_val, y_val) # ์ด๋ ‡๊ฒŒ ํ•˜๋ฉด val set์— ๋Œ€ํ•œ ์ •ํ™•๋„๋ฅผ ๊ตฌํ•  ์ˆ˜ ์žˆ์Œ
์ถ”๊ฐ€ ํŒ(์—ฌ๋Ÿฌ ์ธ์ฝ”๋” ์‚ฌ์šฉ ์˜ˆ์‹œ)
! pip install category_encoders

from category_encoders import OneHotEncoder # scikitlearn์—๋„ ์žˆ๊ธดํ•œ๋ฐ, category_encoders๊บผ๊ฐ€ ๋” ์œ ์šฉํ•˜๊ณ  ์‚ฌ์šฉํ•˜๊ธฐ ์ข‹์Œ

encoder = OneHotEncoder(use_cat_names=True) # ๊ฐ ๋ฒ”์ฃผ์˜ ๋ช…์„ ์ปฌ๋Ÿผ์— ์‚ฌ์šฉํ•˜๊ฒ ๋‹ค๋Š” ๋œป(cat== category)
X_train_encoded = encoder.fit_transform(X_train)
X_val_encoded = encoder.transform(X_val)
from sklearn.impute import SimpleImputer

imputer = SimpleImputer(strategy='mean') # ๊ฒฐ์ธก์น˜๋ฅผ 'ํ‰๊ท '์œผ๋กœ ์ฑ„์›Œ์ฃผ๊ฒ ๋‹ค.
X_train_imputed = imputer.fit_transform(X_train_encoded)
X_val_imputed = imputer.transform(X_val_encoded)
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler() # ๊ฐ ์ปฌ๋Ÿผ์˜ ๋‹จ์œ„๋ฅผ ๋งž์ถ”๋Š” ๊ฒƒ ; ํ‘œ์ค€ํ™”(scailing)
X_train_scaled = scaler.fit_transform(X_train_imputed)
X_val_scaled = scaler.transform(X_val_imputed)

์ด๋Ÿฐ ๊ณผ์ •์„ train ๊ณผ val set์— ์ ์šฉ ํ›„ ๋ชจ๋ธ์— fit(์—ฌ๊ธฐ์„œ๋Š” LogisticRegression)

๋งˆ์ง€๋ง‰์— test set์„ ํ†ตํ•ด ์ตœ์ข… ๊ฒฐ๊ณผ๋ฅผ ๋‚ผ ๋•Œ๋„ ์œ„ ๊ณผ์ •์„ X_test์— ํ•ด์ค˜์•ผํ•ฉ๋‹ˆ๋‹ค.

X_test = test[features]
X_test_encoded = encoder.transform(X_test)
X_test_imputed = imputer.transform(X_test_encoded)
X_test_scaled = scaler.transform(X_test_imputed)

y_pred_test = model.predict(X_test_scaled)

# ์ด๋Ÿฐ ์‹์œผ๋กœ :)

'๐Ÿ’ฟ Data > ๋ถ€ํŠธ์บ ํ”„' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

[TIL]28.Decision Tree(์˜์‚ฌ๊ฒฐ์ •๋‚˜๋ฌด)  (0) 2021.12.26
[TIL]27.Section2_sprint1 challenge  (0) 2021.12.24
[TIL]25.Ridge Regression(๋Šฅํ˜• ํšŒ๊ท€)  (0) 2021.12.21
[TIL]24.๋‹ค์ค‘์„ ํ˜•ํšŒ๊ท€(Multiple Linear Regression)  (0) 2021.12.21
[TIL]23.Simple Regression(๋‹จ์ˆœํšŒ๊ท€)  (0) 2021.12.18
    '๐Ÿ’ฟ Data/๋ถ€ํŠธ์บ ํ”„' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€
    • [TIL]28.Decision Tree(์˜์‚ฌ๊ฒฐ์ •๋‚˜๋ฌด)
    • [TIL]27.Section2_sprint1 challenge
    • [TIL]25.Ridge Regression(๋Šฅํ˜• ํšŒ๊ท€)
    • [TIL]24.๋‹ค์ค‘์„ ํ˜•ํšŒ๊ท€(Multiple Linear Regression)
    Jayden1116
    Jayden1116
    ์•„๋งˆ๋„ ํ•œ๋ฒˆ ๋ฟ์ธ ์ธ์ƒ์„ ์—ฌํ–‰ ์ค‘์ธ Jayden์˜ ์ผ์ง€๐Ÿ„๐ŸŒŠ

    ํ‹ฐ์Šคํ† ๋ฆฌํˆด๋ฐ”