Jayden`s

    [TIL]33.Choose your ML problems

    ๋ชฉํ‘œ ์˜ˆ์ธก๋ชจ๋ธ์„ ์œ„ํ•œ ํ…Œ๊ฐ“์„ ์„ ํƒ, ๊ทธ ๋ถ„ํฌ๋ฅผ ํ™•์ธ train/val set ์‚ฌ์ด ๋˜๋Š” target/features ์‚ฌ์ด์— ์ผ์–ด๋‚˜๋Š” ์ •๋ณด ๋ˆ„์ถœ(leakage) ์˜ˆ๋ฐฉ ์ƒํ™ฉ์— ๋งž๋Š” ๊ฒ€์ฆ ์ง€ํ‘œ(metrics; ํ‰๊ฐ€์ง€ํ‘œ) ์‚ฌ์šฉ ๋ฐ์ดํ„ฐ ๊ณผํ•™์ž ์‹ค๋ฌด ํ”„๋กœ์„ธ์Šค ๋น„์ฆˆ๋‹ˆ์Šค ๋ฌธ์ œ ์‹ค๋ฌด์ž๋“ค๊ณผ ๋Œ€ํ™”๋ฅผ ํ†ตํ•ด ๋ฌธ์ œ ๋ฐœ๊ฒฌ ๋ฐ์ดํ„ฐ ๋ฌธ์ œ ๋ฌธ์ œ์™€ ๊ด€๋ จ๋œ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฐœ๊ฒฌ ๋ฐ ์ˆ˜์ง‘ ๋ฐ์ดํ„ฐ ๋ฌธ์ œ ํ•ด๊ฒฐ ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ, ์‹œ๊ฐํ™” ๋จธ์‹ ๋Ÿฌ๋‹, ํ†ต๊ณ„ ๋น„์ฆˆ๋‹ˆ์Šค ๋ฌธ์ œ ํ•ด๊ฒฐ ๋ฐ์ดํ„ฐ ๋ฌธ์ œ ํ•ด๊ฒฐ์„ ํ†ตํ•ด ์‹ค๋ฌด์ž๋“ค๊ณผ ๋น„์ฆˆ๋‹ˆ์Šค ๋ฌธ์ œ ํ•ด๊ฒฐ ํƒ€๊ฒŸ ์„ ์ • ๋ฐ ๊ทธ ๋ถ„ํฌ ํ™•์ธ ์ง€๋„ํ•™์Šต(Supervised learning)์—์„œ ์˜ˆ์ธกํ•  ํƒ€๊ฒŸ์„ ์„ ์ • ํƒ€๊ฒŸ์— ๋”ฐ๋ผ ํšŒ๊ท€(Regression) / ๋ถ„๋ฅ˜(Classification) ๋ฌธ์ œ ๊ตฌ๋ถ„ ๊ตฌ๋ถ„์ด ์–ด๋ ค์šด ๊ฒฝ์šฐ๋„ ์กด์žฌ ๋˜ํ•œ, ์ด์‚ฐํ˜•, ์ˆœ์„œํ˜•..

    HyperParameter tuning

    GridSearchCV ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹์„ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ๋ชจ๋ธ ์„ฑ๋Šฅ์„ ๋†’์ด๊ธฐ ์œ„ํ•ด ๊ฐ€๋Šฅํ•œ ์‹œ๋„๋ฅผ ๋‹ค ํ•ด๋ณด์„ธ์š”. ๋ชจ๋ธ ์„ฑ๋Šฅ ๊ฐœ์„ ์— ๊ฐ€์žฅ ํฐ ์˜ํ–ฅ์„ ์ค€ ํŠน์„ฑ๊ณตํ•™์ด๋‚˜ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹์— ๋Œ€ํ•ด์„œ ์™œ ์„ฑ๋Šฅ ๊ฐœ์„ ์— ํฐ ์˜ํ–ฅ์„ ์ฃผ์—ˆ๋Š”์ง€ ์„ค๋ช…ํ•ด ๋ณด์‹œ๊ณ  ์„œ๋กœ์˜ ๊ฒฐ๊ณผ์— ๋Œ€ํ•ด ๊ณต์œ ํ•˜๊ณ  ํ† ๋ก ํ•ด ๋ณด์„ธ์š”. Ordinal Encoder ์‚ฌ์šฉ 1-1. RandomizedSearchCV : GridSearchCV๋ฅผ ํ•˜๊ธฐ ์ „ ์ ๋‹นํ•œ ๋ฒ”์œ„๋ฅผ ์ฐพ๊ธฐ ์œ„ํ•ด ์‹คํ–‰ํ•˜์˜€์Šต๋‹ˆ๋‹ค. cross_val_score๋ฅผ ํ†ตํ•ด cv = 5๋กœ ์„ ์ •ํ•˜์˜€์Šต๋‹ˆ๋‹ค. ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ์กฐ์ • ์œ„์˜ ๊ฒฐ๊ณผ์—์„œ ๋Œ€๋žต์ ์œผ๋กœ GridSearchCV์— ๋„ฃ์–ด์ค„ ์ˆซ์ž๋ฅผ ์ƒ๊ฐํ•ด ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. 1-2. GridSearchCV : ์œ„์˜ ๊ฐ’์„ ๊ธฐ์ค€์œผ๋กœ ์•ฝ๊ฐ„์˜ ์ฐจ์ด๋ฅผ ์ฃผ๋ฉฐ ์ตœ์ ์˜ ํŒŒ..

    Evaluation metrics for Classification

    confusion matrix, classification report ๋“ฑ์„ ๊ทธ๋ ค ๋ณด์‹œ๊ณ , ๊ฐ ํ‰๊ฐ€์ง€ํ‘œ๋“ค์— ๋Œ€ํ•ด ์ตœ๋Œ€ํ•œ ๋ถ„์„ํ•˜๊ณ  ๋ฌด์—‡์ด ๋ถ€์กฑํ•œ์ง€ ์–ด๋–ค ๋ฐฉํ–ฅ์œผ๋กœ ์„ฑ๋Šฅ์„ ๋†’์—ฌ์•ผ ํ•  ์ง€ ๋…ผ์˜ํ•ด ๋ณด์„ธ์š”. ๋ถ„๋ฅ˜ ๋ฌธ์ œ์˜ ํ‰๊ฐ€ ์ง€ํ‘œ accuracy(์ •ํ™•๋„) f1_score precision(์ •๋ฐ€๋„) recall(์žฌํ˜„์œจ ; sensitivity) ROC curve ๋ฐ AUC score accuracy(์ •ํ™•๋„) f1_score precision ๋ฐ recall - classification_report train set val set confusion matrix train set val set ROC curve ๋ฐ AUC train set val set train set vs val set ๋‹น์—ฐํ•œ ๊ฒฐ๊ณผ๊ฒ ์ง€๋งŒ, ์—ฌ๋Ÿฌ์ง€ํ‘œ..

    [TIL]32.Section2 Sprint2 Chall(Sprint2 ํ‚ค์›Œ๋“œ ์ค‘์‹ฌ ์ •๋ฆฌ)

    ์•ž์œผ๋กœ๋Š” ์Šคํ”„๋ฆฐํŠธ ์ฑŒ๋ฆฐ์ง€ ์ดํ›„ wrap up ๋‚ด์šฉ์„ ์š”์•ฝํ•˜๋ ค ํ•ฉ๋‹ˆ๋‹ค. ํŠธ๋ฆฌ๋ชจ๋ธ ์„ ํ˜•๋ชจ๋ธ๋ณด๋‹ค ์Šค์ผ€์ผ๋ง์— ๋น„๊ต์  ๋œ ๋ฏผ๊ฐํ•˜๊ณ  ์ ์šฉ ์ „ ์—ฌ๋Ÿฌ ๊ฐ€์ •์— ๋น„๊ต์  ์ž์œ ๋กœ์›€(ํƒ€๋‹น์„ฑ์„ ์œ„ํ•œ ๊ฐ€์ •์„ ๋‘˜ ๊ฒŒ ๋ณ„๋กœ ์—†๋‹ค.) ๊ณผ์ ํ•ฉ๋˜๊ธฐ ์‰ฝ๋‹ค.(๋‹จ, ์–ด๋Š์ •๋„ ๊ณผ์ ํ•ฉ์ด ํ†ต์ œ๊ฐ€ ๋œ๋‹ค๋ฉด ์•„์˜ˆ ํ•™์Šต์ด ์•ˆ๋˜๋Š” ๊ฒฝ์šฐ๋ณด๋‹จ ๋‚ซ๋‹ค.) ์‚ฌ์‹ค ์˜คํžˆ๋ ค ๋ฐœ์ „ํ•œ ๋ชจ๋ธ์ผ์ˆ˜๋ก ํ•™์Šต์„ ๋„ˆ๋ฌด ์ž˜ํ•ด์„œ ๊ณผ์ ํ•ฉ๋˜๊ธฐ ์‰ฝ๋‹ค. ํŠธ๋ฆฌ์—์„œ์˜ ๋น„์šฉํ•จ์ˆ˜ ๋ถˆ์ˆœ๋„(Impurity) : ์ง€๋‹ˆ๋ถˆ์ˆœ๋„, ์—”ํŠธ๋กœํ”ผ / Information gain์— ๋Œ€ํ•œ ๊ฐœ๋…! ํŠธ๋ฆฌ๋ชจ๋ธ์˜ ๋…ธ๋“œ์—์„œ์˜ ์ˆœ๊ฐ„ ์ˆœ๊ฐ„์€ ์ตœ์ ์ธ๋ฐ ํŠธ๋ฆฌ ์ „์ฒด๋ฅผ ๋†“๊ณ  ๋ณผ ๋•Œ๋Š” ์ตœ์ ์ด ์•„๋‹ ์ˆ˜ ์žˆ๋‹ค. -> ์ƒ๊ฐํ•ด๋ณผ ๋ฌธ์ œ ํŒŒ์ดํ”„๋ผ์ธ ํŠน์ง• : ๊ฐ„๊ฒฐํ•จ, ์ „์ฒ˜๋ฆฌ์™€ ๋ชจ๋ธ๋ง์„ ์ด์–ด์„œ ์ƒ๊ฐํ•  ์ˆ˜ ์žˆ์Œ, ํ˜‘์—… ์‹œ์—๋„ ์ค‘์š” ํŠน์„ฑ ์ค‘์š”๋„ ํŠน์„ฑ์ด ๋…ธ๋“œ์— ๊ฐœ..

    [TIL]31.Model Selection(๋ชจ๋ธ ์„ ํƒ)

    ๋ชฉํ‘œ Model Selection(๋ชจ๋ธ ์„ ํƒ)์„ ์œ„ํ•œ Cross Validation(๊ต์ฐจ๊ฒ€์ฆ) ๋ฐฉ๋ฒ• ์ดํ•ด ๋ฐ ํ™œ์šฉ Hyperparameter๋ฅผ ์ตœ์ ํ™”ํ•˜์—ฌ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ ํ–ฅ์ƒ Cross-Validation(๊ต์ฐจ๊ฒ€์ฆ) Hold-Out ๊ต์ฐจ๊ฒ€์ฆ : train/validate/test set์œผ๋กœ ๋‚˜๋ˆ  ํ•™์Šต์„ ์ง„ํ–‰ train set์˜ ํฌ๊ธฐ๊ฐ€ ์ž‘์„ ๋•Œ๋Š” val set์„ ๋”ฐ๋กœ ๋ถ„๋ฆฌํ•˜๋Š” ๊ฒƒ์ด ๋ถ€๋‹ด์ด ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์–ต์ง€๋กœ val set์„ ๋”ฐ๋กœ ์ถ”์ถœํ•ด๋„ ์˜ˆ์ธก ์„ฑ๋Šฅ์— ๋Œ€ํ•œ ์ถ”์ •์ด ๋ถ€์ •ํ™•ํ•  ํ™•๋ฅ ์ด ๋†’์Šต๋‹ˆ๋‹ค. K-fold ๊ต์ฐจ๊ฒ€์ฆ : ๋ฐ์ดํ„ฐ๋ฅผ k๊ฐœ๋กœ ๋“ฑ๋ถ„ํ•˜๊ณ  k๊ฐœ์˜ ์ง‘ํ•ฉ์—์„œ k-1๊ฐœ๋Š” train set, 1๊ฐœ๋Š” val set์œผ๋กœ ์‚ฌ์šฉํ•˜์—ฌ k๋ฒˆ ํ•™์Šตํ•˜๋Š” ๋ฐฉ๋ฒ• ์œ„์˜ Hold-Out ๋ฐฉ๋ฒ•์˜ ๋‹จ์ ์„ ๊ทน๋ณตํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์–ด๋–ค ํ•™์Šต ..