Jayden`s

    ANOVA ์˜ˆ์‹œ, ์—ฌ๋Ÿฌ ์ƒ˜ํ”Œ๋ง

    1. ANOVA ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ๋Š” ์ƒ๋žตํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. df_tree.head() df_tree_mel = df_tree.reset_index().melt(id_vars='index', value_vars=['์€ํ–‰๋‚˜๋ฌด','์–‘๋ฒ„์ฆ˜๋‚˜๋ฌด','๋Šํ‹ฐ๋‚˜๋ฌด']) # ์ด์ƒ์น˜ ๋ฐ ์‹œ๊ฐํ™”๋ฅผ ์œ„ํ•œ melting from scipy import stats stats.f_oneway(df_tree['์€ํ–‰๋‚˜๋ฌด'], df_tree['์–‘๋ฒ„์ฆ˜๋‚˜๋ฌด'], df_tree['๋Šํ‹ฐ๋‚˜๋ฌด']) F_onewayResult(statistic=17.006289557888046, pvalue=8.935183167883698e-07) ๊ท€๋ฌด๊ฐ€์„ค(H0) : ์„œ์šธ์‹œ์˜ ..

    Python ๋ฏธ๋ถ„

    ์‹œ๊ทธ๋ชจ์ด๋“œ ํ•จ์ˆ˜ ๋„ํ•จ์ˆ˜ ๊ตฌํ•ด์„œ ๊ฐ’ ๊ตฌํ•ด๋ณด๊ธฐ from math import exp def sig(x): return 1 / (1 + exp(-x)) # ์›ํ•จ์ˆ˜ ์ •์˜ from scipy.misc import derivative def sig_prime(x): return derivative(sig, x, dx=1e-5) sig_prime(3) # x=3 ์ผ ๋•Œ์˜ ๊ฐ’ ๊ตฌํ•ด๋ณด๊ธฐ 0.04517665972980644

    ๋ฐ์ดํ„ฐ ์ •๋ฆฌ ๋ฐ ์‹œ๊ฐํ™” ์˜ˆ์‹œ ๊ธฐ๋ก

    from google.colab import files uploaded = files.upload() import pandas as pd # ์—…๋กœ๋“œํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ์ง„ํ–‰ํ•˜์˜€์Šต๋‹ˆ๋‹ค. file1 = pd.read_csv('n113_๋งˆ๋ฆฌํ™”๋‚˜.txt', sep='\t') file2 = pd.read_csv('n113_ํ•ด์šด.txt', sep='\t') # txt ๋ฐ์ดํ„ฐ๋ผ ๊ตฌ๋ถ„์ž๋ฅผ ์ •ํ•ด์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ ๊ณผ์ • ์ฝ”๋“œ๋Š” ์ƒ๋žตํ•˜๊ณ˜์Šต๋‹ˆ๋‹ค. ์œ„์™€ ๊ฐ™์ด ๋ฐ์ดํ„ฐ๋ฅผ ์ •๋ฆฌํ•œ ํ›„ 'ํ…Œ๋งˆ'๋กœ ๊ฐ ์ปฌ๋Ÿผ์˜ ํ‰๊ท  ํ…Œ์ด๋ธ”์„ ๋งŒ๋“ค์—ˆ์Šต๋‹ˆ๋‹ค. df1 = df.groupby('ํ…Œ๋งˆ').mean() !sudo apt-get install -y fonts-nanu..

    Feature Engineering_๊ฒฐ์ธก์น˜ ์ฒ˜๋ฆฌ, apply ํ•จ์ˆ˜ ์ ์šฉ

    NA Value Handling 19๋…„๋„ 4๋ถ„๊ธฐ์˜ ๋‹น๊ธฐ์ˆœ์ด์ต(๋น„์ง€๋ฐฐ) ๋ถ€๋ถ„์„ Na๋กœ ๋Œ€์ฒดํ•˜์„ธ์š” ์ดํ›„ ํ•ด๋‹น ๊ฒฐ์ธก์น˜๋ฅผ mean imputation ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•˜์—ฌ ์ฒ˜๋ฆฌํ•˜์„ธ์š”. Feature Engineering Relative Perfomance ๋ผ๋Š” ์ƒˆ๋กœ์šด feature๋ฅผ ๊ณ„์‚ฐํ•˜์„ธ์š”. ์ด๋Š” ์ตœ๊ทผ 1๋…„์น˜ ๋งค์ถœ์•ก์˜ ํ‰๊ท ๊ฐ’์„ ๊ธฐ์ค€์œผ๋กœ 10% ์ด์ƒ -> S 5% ์ด์ƒ -> A -5 ~ 5% -> B -5%์ดํ•˜ -> C 10%์ดํ•˜ -> D ๋ผ๋Š” ๊ฐ’์„ ๊ฐ–๋Š” feature์ž…๋‹ˆ๋‹ค. 20๋…„๋„ 2๋ถ„๊ธฐ์— ํ•ด๋‹นํ•˜๋Š” ๊ฒฐ๊ณผ๊ฐ’์€ A๊ฐ€ ๋‚˜์™€์•ผํ•ฉ๋‹ˆ๋‹ค. ๊ฐ๊ฐ์— ํ•ด๋‹นํ•˜๋Š” ๋“ฑ๊ธ‰์ด ๋‚˜์˜ค๊ธฐ ์œ„ํ•ด์„œ ํ•„์š”ํ•œ ๋งค์ถœ์•ก์„ ์ถ”๊ฐ€๋กœ ์„œ์ˆ ํ•˜์„ธ์š”. url = 'https://ds-lecture-data.s3.ap-northeast-2.amazonaws...

    Seaborn 'penguins'

    import seaborn as sns pp = sns.load_dataset('penguins') penguins ๋ฐ์ดํ„ฐ๋ฅผ ๊ณ„์† ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ์ด์— ๋Œ€ํ•ด ์•„๋ž˜์˜ task๋“ค์„ ์‹œํ–‰ํ•˜์„ธ์š”. ๊ฒฐ์ธก์น˜ ์ฒ˜๋ฆฌ (์ œ๊ฑฐ) bill_length_mm์— ๋Œ€ํ•ด์„œ qqplot ๊ทธ๋ฆฌ๊ธฐ island์— ๋Œ€ํ•ด์„œ ๋‹ค๋ฅธ 4๊ฐœ์˜ numerical feature ๋ฅผ boxplot์œผ๋กœ ํ‘œํ˜„ํ•˜๊ธฐ ๊ฐ numerical feature์— ๋Œ€ํ•ด์„œ summary statistics : mean, sd, Quantiles(1Q, 2Q, 3Q, 4Q)๋ฅผ ๊ณ„์‚ฐํ•˜์„ธ์š”. 1. ๊ฒฐ์ธก์น˜ ์ฒ˜๋ฆฌ(์ œ๊ฑฐ) pp.isna().sum() # ๋จผ์ € ๊ฒฐ์ธก์น˜ ๊ฐœ์ˆ˜๋ฅผ ํ™•์ธํ•˜์˜€์Šต๋‹ˆ๋‹ค. pp_clean = pp.dropna(axis=0) # ๊ฒฐ์ธก์น˜๋ฅผ ๊ฐ–๊ณ  ์žˆ๋Š” ํ–‰ ์ œ๊ฑฐ ..