1. ANOVA
λ°μ΄ν° μ μ²λ¦¬λ μλ΅νκ² μ΅λλ€.
df_tree.head()
df_tree_mel = df_tree.reset_index().melt(id_vars='index', value_vars=['μνλ무','μλ²μ¦λ무','λν°λ무']) # μ΄μμΉ λ° μκ°νλ₯Ό μν melting
from scipy import stats
stats.f_oneway(df_tree['μνλ무'], df_tree['μλ²μ¦λ무'], df_tree['λν°λ무'])
F_onewayResult(statistic=17.006289557888046, pvalue=8.935183167883698e-07)
κ·λ¬΄κ°μ€(H0) : μμΈμμ κ΅¬λ³ νκ· μνλ무, μλ²μ¦λ무, λν°λ무μ μλ λͺ¨λ κ°λ€.
λ립κ°μ€(H1) : λͺ¨λ κ°μ§λ μλ€.
p-value << 0.05 μ΄λ―λ‘ κ·λ¬΄κ°μ€μ κΈ°κ°
μμΈμμ κ΅¬λ³ νκ· μνλ무, μλ²μ¦λ무, λν°λ무 μλ λͺ¨λ κ°μ§ μλ€.
μ΄ν κ°κ°μ λ무λ€μ λν΄ 2 sample t-testλ₯Ό μ§ννμμ΅λλ€.
2. μνλ§
1) Simple Random Sampling
import random
random.sample(range(1,101), 20)
import numpy as np
pop = np.arange(1,101)
np.random.choice(pop, size=sample_size, replace=False)
2) Symetric Sampling
import random
np.array(range(random.randint(1,5),101,5))
3) Stratified Random Sampling
sample = []
for i in range(10,101,10):
sam = np.random.choice(pop[i-9:i], int(sample_size/10), replace=False)
sample.append(sam)
np.hstack(sample)
4) Clustering Sampling
condition = (pop % 5 == random.randint(0,4))
pop[condition]
sample_size = 20
'πΏ Data > μ΄λͺ¨μ λͺ¨' μΉ΄ν κ³ λ¦¬μ λ€λ₯Έ κΈ
λ² μ΄μ§μ μμ νμ΄(Bayesian Problem example) (0) | 2021.12.10 |
---|---|
ν° μμ λ²μΉ, μ€μ¬κ·Ήνμ 리 μ½λλ‘ κ΅¬ν (0) | 2021.12.10 |
Python λ―ΈλΆ (0) | 2021.12.10 |
λ°μ΄ν° μ 리 λ° μκ°ν μμ κΈ°λ‘ (0) | 2021.12.10 |
Feature Engineering_κ²°μΈ‘μΉ μ²λ¦¬, apply ν¨μ μ μ© (0) | 2021.12.09 |