๐Ÿ’ฟ Data/์ด๋ชจ์ €๋ชจ

    Feature Engineering_๊ฒฐ์ธก์น˜ ์ฒ˜๋ฆฌ, apply ํ•จ์ˆ˜ ์ ์šฉ

    NA Value Handling 19๋…„๋„ 4๋ถ„๊ธฐ์˜ ๋‹น๊ธฐ์ˆœ์ด์ต(๋น„์ง€๋ฐฐ) ๋ถ€๋ถ„์„ Na๋กœ ๋Œ€์ฒดํ•˜์„ธ์š” ์ดํ›„ ํ•ด๋‹น ๊ฒฐ์ธก์น˜๋ฅผ mean imputation ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•˜์—ฌ ์ฒ˜๋ฆฌํ•˜์„ธ์š”. Feature Engineering Relative Perfomance ๋ผ๋Š” ์ƒˆ๋กœ์šด feature๋ฅผ ๊ณ„์‚ฐํ•˜์„ธ์š”. ์ด๋Š” ์ตœ๊ทผ 1๋…„์น˜ ๋งค์ถœ์•ก์˜ ํ‰๊ท ๊ฐ’์„ ๊ธฐ์ค€์œผ๋กœ 10% ์ด์ƒ -> S 5% ์ด์ƒ -> A -5 ~ 5% -> B -5%์ดํ•˜ -> C 10%์ดํ•˜ -> D ๋ผ๋Š” ๊ฐ’์„ ๊ฐ–๋Š” feature์ž…๋‹ˆ๋‹ค. 20๋…„๋„ 2๋ถ„๊ธฐ์— ํ•ด๋‹นํ•˜๋Š” ๊ฒฐ๊ณผ๊ฐ’์€ A๊ฐ€ ๋‚˜์™€์•ผํ•ฉ๋‹ˆ๋‹ค. ๊ฐ๊ฐ์— ํ•ด๋‹นํ•˜๋Š” ๋“ฑ๊ธ‰์ด ๋‚˜์˜ค๊ธฐ ์œ„ํ•ด์„œ ํ•„์š”ํ•œ ๋งค์ถœ์•ก์„ ์ถ”๊ฐ€๋กœ ์„œ์ˆ ํ•˜์„ธ์š”. url = 'https://ds-lecture-data.s3.ap-northeast-2.amazonaws...

    Seaborn 'penguins'

    import seaborn as sns pp = sns.load_dataset('penguins') penguins ๋ฐ์ดํ„ฐ๋ฅผ ๊ณ„์† ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ์ด์— ๋Œ€ํ•ด ์•„๋ž˜์˜ task๋“ค์„ ์‹œํ–‰ํ•˜์„ธ์š”. ๊ฒฐ์ธก์น˜ ์ฒ˜๋ฆฌ (์ œ๊ฑฐ) bill_length_mm์— ๋Œ€ํ•ด์„œ qqplot ๊ทธ๋ฆฌ๊ธฐ island์— ๋Œ€ํ•ด์„œ ๋‹ค๋ฅธ 4๊ฐœ์˜ numerical feature ๋ฅผ boxplot์œผ๋กœ ํ‘œํ˜„ํ•˜๊ธฐ ๊ฐ numerical feature์— ๋Œ€ํ•ด์„œ summary statistics : mean, sd, Quantiles(1Q, 2Q, 3Q, 4Q)๋ฅผ ๊ณ„์‚ฐํ•˜์„ธ์š”. 1. ๊ฒฐ์ธก์น˜ ์ฒ˜๋ฆฌ(์ œ๊ฑฐ) pp.isna().sum() # ๋จผ์ € ๊ฒฐ์ธก์น˜ ๊ฐœ์ˆ˜๋ฅผ ํ™•์ธํ•˜์˜€์Šต๋‹ˆ๋‹ค. pp_clean = pp.dropna(axis=0) # ๊ฒฐ์ธก์น˜๋ฅผ ๊ฐ–๊ณ  ์žˆ๋Š” ํ–‰ ์ œ๊ฑฐ ..

    ๋ฐ์ดํ„ฐ ๋‹ค๋ฃจ๊ธฐ ์˜ˆ์‹œ2

    # Import Packages import pandas as pd import numpy as np import seaborn as sns # dataset upload df = sns.load_dataset("titanic") df 1. index ๋ฐ columns ๋‹ค๋ฃจ๊ธฐ Q. 'survived' ์ปฌ๋Ÿผ์„ index๋กœ ๋งŒ๋“ค์–ด ํ™•์ธํ•˜๊ณ , ๋‹ค์‹œ 'survived' ์ปฌ๋Ÿผ์„ ๋Œ๋ ค๋†“์€ ๋’ค ์ธ๋ฑ์Šค๋ฅผ ์ดˆ๊ธฐํ™”์‹œํ‚ค์„ธ์š”. df.set_index('survived', inplace=True) temp = df.index df.reset_index(drop=True, inplace=True) df['survived'] = temp Q. DataFrame df์˜ ์ปฌ๋Ÿผ๋ช…..

    ๋ฐ์ดํ„ฐ ๋‹ค๋ฃจ๊ธฐ ์˜ˆ์‹œ1

    # Import Packages import pandas as pd import numpy as np import seaborn as sns # dataset upload df = sns.load_dataset("titanic") df 1. ๊ฒฐ์ธก์น˜ ๋‹ค๋ฃจ๊ธฐ Q. 'deck'์ปฌ๋Ÿผ์˜ ๊ฒฐ์ธก์น˜ ๊ฐœ์ˆ˜๋Š” ๋ช‡ ๊ฐœ์ธ๊ฐ€์š”? df['deck'].isna().sum() # ํŠน์ • ์ปฌ๋Ÿผ์— ๊ฒฐ์ธก์น˜ ๊ฐœ์ˆ˜ ์„ธ๊ธฐ Q. ๋ชจ๋“  ๊ฒฐ์ธก์น˜๋Š” ์ปฌ๋Ÿผ๊ธฐ์ค€ ์ง์ „์˜ ๊ฐ’์œผ๋กœ ๋Œ€์ฒดํ•˜๊ณ , ์ฒซ๋ฒˆ์งธ ํ–‰์— ๊ฒฐ์ธก์น˜๊ฐ€ ์žˆ์„ ๊ฒฝ์šฐ ๋’ค์— ์žˆ๋Š” ๊ฐ’์œผ๋กœ ๋Œ€์ฒดํ•˜์„ธ์š” df['deck'].fillna(method='ffill', inplace=True) # ๋จผ์ € ์ „์ฒด์— ๋Œ€ํ•ด์„œ ์ง์ „๊ฐ’ ์ ์šฉ df['deck']...

    Cramer's rule(ํฌ๋ ˆ์ด๋จธ ์†Œ๊ฑฐ๋ฒ•)

    ๋‹ค์Œ ๋งํฌ์˜ ๋‚ด์šฉ์„ ์ฐธ์กฐํ•˜์—ฌ Cramer's rule์„ ์‚ฌ์šฉํ•ด x1 , x2 , x3 ์˜ ๊ฐ’์„ ๊ตฌํ•˜์„ธ์š”. https://youtu.be/6StS7VjtuGI x1 + 2x3 = 6 −3x1 + 4x2 + 6x3 = 30 −x1 −2x2 + 3x3 = 8 ๊ฐœ์ธ์ ์œผ๋กœ ์œ„์˜ ์˜์ƒ์„ ๋Œ€๋žต ์ดํ•ดํ•˜๊ณ  ์ฝ”๋“œ๋กœ ๊ตฌํ˜„ํ•ด๋ดค๋Š”๋ฐ, ๊ณ„์‚ฐ ํšŸ์ˆ˜๋ฅผ ๋Š˜๋ฆด ๋•Œ๋งˆ๋‹ค ๊ณ„์† ๊ฐ’์ด ๋‹ฌ๋ผ์ ธ์„œ ๊ตฌ๊ธ€๋งํ•ด์„œ ๋‚˜์˜จ ๊ณต์‹์„ ์ ์šฉํ•˜์˜€์Šต๋‹ˆ๋‹ค. import numpy as np A = np.array([[1, 0, 2], [-3, 4, 6], [-1, -2, 3]]) b = np.array([[6], [30], [8]]) det(A)์˜ ๊ฐ’์„ ๊ฐ๊ฐ 1ํ–‰์— b๋ฅผ ๋„ฃ๊ณ  ๋‚˜์˜จ det ๊ฐ’, 2ํ–‰์— ๋„ฃ๊ณ  ๋‚˜์˜จ ๊ฐ’, 3ํ–‰์— ๋„ฃ๊ณ  ๋‚˜์˜จ ๊ฐ’์„ ๋‚˜๋ˆ„๋ฉด ๊ทธ๊ฒŒ ๊ณง ํ•ด๊ฐ€..