ANOVA 예시, 여러 샘플링

1. ANOVA

데이터 전처리는 생략하겠습니다.

df_tree.head()

df_tree_mel = df_tree.reset_index().melt(id_vars='index', value_vars=['은행나무','양버즘나무','느티나무']) # 이상치 및 시각화를 위한 melting

from scipy import stats

stats.f_oneway(df_tree['은행나무'], df_tree['양버즘나무'], df_tree['느티나무'])

F_onewayResult(statistic=17.006289557888046, pvalue=8.935183167883698e-07)

귀무가설(H0) : 서울시의 구별 평균 은행나무, 양버즘나무, 느티나무의 수는 모두 같다.
대립가설(H1) : 모두 같지는 않다.

p-value << 0.05 이므로 귀무가설은 기각
서울시의 구별 평균 은행나무, 양버즘나무, 느티나무 수는 모두 같진 않다.

이후 각각의 나무들에 대해 2 sample t-test를 진행하였습니다.

2. 샘플링

1) Simple Random Sampling

import random
random.sample(range(1,101), 20)

import numpy as np

pop = np.arange(1,101)

np.random.choice(pop, size=sample_size, replace=False)

2) Symetric Sampling

import random
np.array(range(random.randint(1,5),101,5))

3) Stratified Random Sampling

sample = []

for i in range(10,101,10):
  sam = np.random.choice(pop[i-9:i], int(sample_size/10), replace=False)
  sample.append(sam)

np.hstack(sample)

4) Clustering Sampling

condition = (pop % 5 == random.randint(0,4))
pop[condition]
sample_size = 20

'💿 Data > 이모저모' 카테고리의 다른 글

베이지안 예시 풀이(Bayesian Problem example) (0)	2021.12.10
큰 수의 법칙, 중심극한정리 코드로 구현 (0)	2021.12.10
Python 미분 (0)	2021.12.10
데이터 정리 및 시각화 예시 기록 (0)	2021.12.10
Feature Engineering_결측치 처리, apply 함수 적용 (0)	2021.12.09