Jayden`s

    [TIL]27.Section2_sprint1 challenge

    linear regression multiple regression ridge regression logistic regression ํฐ ๊ณจ์ž๋Š” ์œ„์™€ ๊ฐ™์€ ๋ชจ๋ธ๋“ค์„ ๋ฐฐ์› ๋‹ค. ํšŒ๊ท€์™€ ๋ถ„๋ฅ˜์— ๋”ฐ๋ผ ์‚ฌ์šฉํ•˜๋Š” ๋ชจ๋ธ์ด ๋‹ค๋ฅด๊ณ  ๋ชจ๋ธ์— ๋”ฐ๋ผ ํ‰๊ฐ€์ง€ํ‘œ๊ฐ€ ๋‹ค๋ฅด๊ณ  ๊ทธ ๋ชจ๋ธ์— ๋ฐ์ดํ„ฐ๋ฅผ ๋Œ€์ž…ํ•˜๊ธฐ ์ „ train, validate, test set์„ ๊ตฌ๋ถ„ํ•˜๊ณ  ๊ทธ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด OneHot encoding, scaling, polynomial ๋“ฑ๋“ฑ ๋‹ค์–‘ํ•œ ์ธ์ฝ”๋”๋“ค์ด ์žˆ๋‹ค. ๋˜ํ•œ ๊ฐ ๋ชจ๋ธ๋“ค์— ๋”ฐ๋ผ ์กฐ์ ˆํ•  ์ˆ˜ ์žˆ๋Š” ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๋„ ์žˆ์—ˆ๋‹ค. ์ด๋ฒˆ ์ฃผ์— ๋Š๋‚€ ๊ฒƒ์€ ๋จธ์‹ ๋Ÿฌ๋‹ ์ž์ฒด๋Š” ๋„ˆ๋ฌด ์žฌ๋ฏธ์žˆ๋‹ค. ๋‹ค๋งŒ, Section 1 ์—์„œ ํ•™์Šตํ•œ EDA, Feature Engineering์ด ์–ผ๋งˆ๋‚˜ ์‹ค๋ฌด์—์„œ ์ค‘์š”ํ•œ์ง€ ๋Š๋‚„ ์ˆ˜ ์žˆ์—ˆ๋‹ค. ํ†ต๊ณ„์ , ์‹œ๊ฐํ™”,..

    Ridge regression, ๋ชจ๋ธ ์„ฑ๊ณผ ํ‰๊ฐ€ ์ง€ํ‘œ, OneHotencoding, feature selection

    1) ๋ณธ์ธ๋งŒ์˜ ์•„์ด๋””์–ด๋กœ best ridge regression model์„ ๋งŒ๋“ค์–ด ์„œ๋กœ ๊ณต์œ ํ•˜์‹œ๊ณ  ํ† ๋ก ํ•ด ๋ณด์„ธ์š”. ์–ด๋–ค ํŠน์„ฑ๊ณตํ•™์„ ์‚ฌ์šฉํ–ˆ๊ณ , ๊ทธ ์ด์œ ์™€ ๊ธฐ๋Œ€ํšจ๊ณผ๋Š” ๋ฌด์—‡์ด์—ˆ๋‚˜์š”? ์ฒ˜์Œ ๋ณ€์ˆ˜๋Š” 'Rooms, Type, Price, Method, Postcode, Regionname, Propertycount, Distance, CouncilArea' ์ž…๋‹ˆ๋‹ค. ์ด ์ค‘ OneHotencoding์„ ์—ผ๋‘์— ๋‘๊ณ  unique ๊ฐ’์ด ๋„ˆ๋ฌด ๋งŽ๋‹ค๊ณ  ํŒ๋‹จํ•œ ๋ณ€์ˆ˜๋Š” drop ํ•ด์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค. ๋˜ 'Postcode' ๊ฐ™์€ ๊ฒฝ์šฐ ์ˆซ์žํ˜• ๋ฐ์ดํ„ฐ์ธ๋ฐ ๊ทธ ์ˆœ์„œ(ํฌ๊ธฐ)๊ฐ€ ์˜๋ฏธ๊ฐ€ ์—†์„ ๊ฒƒ์ด๋ผ ํŒ๋‹จํ–ˆ๊ณ  ์ฐจ๋ผ๋ฆฌ Regionname์„ ์›ํ•ซ์ธ์ฝ”๋”ฉ์„ ํ†ตํ•ด ๊ทธ ์ง€์—ญ์— ๋Œ€ํ•œ ๋ณ€์ˆ˜๋กœ ๋Œ€์ž…ํ•ด์ฃผ๋Š” ๊ฒŒ ์˜ณ๋‹ค ์ƒ๊ฐํ•˜์—ฌ dropํ•˜์˜€..

    ์ƒˆ๋กœ์šด ํŠน์„ฑ(ํŠน์„ฑ๊ณตํ•™), ์ด์ƒ์น˜, Scaler, ๋ชจ๋ธ ์„ฑ๋Šฅ ํ–ฅ์ƒ

    1) ์ƒˆ๋กœ์šด ํŠน์„ฑ์„ ๋งŒ๋“ค ์ˆ˜ ์žˆ๋‹ค๋ฉด, ์–ด๋–ค ํŠน์„ฑ๊ณตํ•™์„ ํ•ด ๋ณผ ์ˆ˜ ์žˆ์„๊นŒ์š”? BMI(๋น„๋งŒ์ง€์ˆ˜) = ๋ชธ๋ฌด๊ฒŒ / ํ‚ค^2 (ํ‚ค : [m], ๋ชธ๋ฌด๊ฒŒ : [kg]) ๋Œ€์‚ฌ์ฆํ›„๊ตฐ ๊ฐ€๋Šฅ์„ฑ : ์ˆ˜์ถ•๊ธฐ ํ˜ˆ์••๊ณผ ์ด์™„๊ธฐ ํ˜ˆ์•• ์ฐจ์ด ์ฐธ๊ณ  age / 365 ๋ฅผ ํ†ตํ•ด ๋‚˜์ด๋กœ ๋งž์ถ”๊ธฐ 2) ์•„์›ƒ๋ผ์ด์–ด๊ฐ€ ์žˆ๋Š” ํŠน์„ฑ์ด ์žˆ๋‹ค๋ฉด, ์–ด๋–ค ๊ธฐ์ค€์œผ๋กœ ์ œ๊ฑฐํ•  ์ˆ˜ ์žˆ์„๊นŒ์š”? ์ด ๋ถ€๋ถ„์ด ๋„๋ฉ”์ธ ์ง€์‹๊ณผ ์—ฐ๊ด€์ด ํฐ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค. ๋จผ์ € boxplot๊ณผ ๊ฐ™์€ ์‹œ๊ฐํ™” ์ž๋ฃŒ๋ฅผ ํ†ตํ•ด์„œ ์ด์ƒ์น˜์˜ ์œ ๋ฌด๋ฅผ ํ™•์ธ ํ†ต๊ณ„์น˜์— ๊ธฐ๋ฐ˜ํ•ด์„œ ์ƒ์œ„ ๋ฐ ํ•˜์œ„ %์˜ ๊ฐ’์„ ์ œ๊ฑฐ ํ˜น์€ ๋„๋ฉ”์ธ ์ง€์‹์„ ๋„์ž…ํ•ด์„œ ์ด์ƒ์น˜์— ๋Œ€ํ•œ ๊ธฐ์ค€์„ ์žก๊ณ  ์ œ๊ฑฐ ๊ณผ์ œ์˜ ์˜ˆ์‹œ์—์„œ ์ €๊ฐ™์€ ๊ฒฝ์šฐ๋Š” ๋‹จ์ˆœํžˆ ํ†ต๊ณ„์น˜๋กœ ํ•˜๊ฒŒ ๋˜๋‹ˆ ๋ชธ๋ฌด๊ฒŒ๊ฐ€ 100kg๋งŒ ๋„˜์–ด๊ฐ€๋„ ์ œ๊ฑฐ๊ฐ€ ๋˜์–ด๋ฒ„๋ ค์„œ ๋”ฐ๋กœ ๋„๋ฉ”์ธ ์ง€์‹์„ ์„œ์น˜ ํ˜น์€ ์ƒ์‹์„ (๊ต‰์žฅํžˆ ์ฃผ๊ด€..

    [TIL]26.Logistic Regression(๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€; ๋ถ„๋ฅ˜)

    ๋ชฉํ‘œ ํ›ˆ๋ จ/๊ฒ€์ฆ/ํ…Œ์ŠคํŠธ(train/validate/test) ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์ดํ•ด ๋ถ„๋ฅ˜(Classification)์™€ ํšŒ๊ท€(Regression)์˜ ์ฐจ์ด์ ์„ ํŒŒ์•…ํ•˜๊ณ  ๋ฌธ์ œ์— ๋งž๋Š” ๋ชจ๋ธ ์‚ฌ์šฉ ๋กœ์ง€์Šคํ‹ฑํšŒ๊ท€(Logistic Regression)์— ๋Œ€ํ•œ ์ดํ•ด Train/Validate/Test data Kaggle 'Titanic: Machine Learning from Disaster' ์˜ˆ์‹œ import pandas as pd train = pd.read_csv('https://ds-lecture-data.s3.ap-northeast-2.amazonaws.com/titanic/train.csv') test = pd.read_csv('https://ds-lecture-data.s..

    '21.12.21(ํ™”)_๋งค์ผ๊ฒฝ์ œ

    ์ค‘๊ตญ ๊ธฐ์ค€๊ธˆ๋ฆฌ ๊ธฐ์Šต์ธํ•˜์— ์‹œ์žฅ ๋ถˆ์•ˆ --- ์‚ผ์ฒœํ”ผ(์ฝ”์Šคํ”ผ : 3000) ๊บ ์ ธ ์ค‘๊ตญ ์ค‘์•™์€ํ–‰์ธ ์ธ๋ฏผ์€ํ–‰์ด 20์ผ ๊ธฐ์ค€๊ธˆ๋ฆฌ ์ธํ•˜ ๋‹จํ–‰ ๊ธฐ์ค€๊ธˆ๋ฆฌ ์—ญํ• ์„ ํ•˜๋Š” 1๋…„ ๋งŒ๊ธฐ ๋Œ€์ถœ์šฐ๋Œ€๊ธˆ๋ฆฌ(LPR)์‘ 0.05% ๋‚ด๋ฆผ(20๊ฐœ์›”๋งŒ) ๋ฏธ์ˆ  NFT, 10๋…„ ๋’ค 100๋ฐฐ๋กœ... ์‹ค์ œ์‹œ์žฅ ์œก๋ฐ•ํ•  ๊ฒƒ ๋‚ด๋…„ ๋ฏธ์ˆ ํ’ˆ ๋Œ€์ฒด๋ถˆ๊ฐ€ํ† ํฐ(NFT)๊ฐ€ ์„ฑ์žฅํ•˜๊ธฐ ์‹œ์ž‘ํ•˜๋ฉด์„œ ํ–ฅํ›„ 10๋…„๊ฐ„ 100๋ฐฐ ์˜ˆ์ƒ ์„ธ๊ณ„์  ๊ฐ€์ƒํ™”ํ ๋ถ„์„๊ธฐ๊ด€์ธ ๋ฉ”์‚ฌ๋ฆฌ '2022๋…„ ๊ฐ€์ƒํ™”ํ ์—…๊ณ„ ์ „๋ง' ๋ณด๊ณ ์„œ์—์„œ ์ด๊ฐ™์ด ์ „๋ง ์ธํ„ฐ๋„ท์— ๋ธ”๋กœ์ฒด์ธ ๊ธฐ์ˆ ์ด ์ ์šฉ๋จ์— ๋”ฐ๋ผ ๊ฐœ๋ฐฉ์„ฑ๊ณผ ํƒˆ์ค‘์•™ํ™”๋ฅผ ์ง€ํ–ฅํ•˜๋Š” NFT, ๋””ํŒŒ์ด, ๋ฉ”ํƒ€๋ฒ„์Šค ๋“ฑ์ด ๊ธ‰๊ฒฉํžˆ ์„ฑ์žฅํ•  ๊ฒƒ์œผ๋กœ ์˜ˆ์ƒ ์‹ค์ œ๋กœ NFT๋ถ„์•ผ๊ฐ€ ์–ด๋–ป๊ฒŒ ๋ ์ง€๋Š” ์ •ํ™•ํžˆ๋Š” ๋ชจ๋ฅด๊ฒ ์ง€๋งŒ, ์—ฌ๋Ÿฌ๊ฐ€์ง€ NFT(๋ฏธ์ˆ ํ’ˆ, ์Œ์•… ์ €์ž‘๊ถŒ ๋“ฑ)๋“ค ์ค‘์—๋Š” ์ œ์ผ ์‹œ์žฅ์„ฑ์ด..