Jayden1116
Jayden`s LifeTrip ๐Ÿ”†
Jayden1116
์ „์ฒด ๋ฐฉ๋ฌธ์ž
์˜ค๋Š˜
์–ด์ œ
  • Jayden`s (481)
    • ๐Ÿฏ Hello, Jayden (144)
      • ์ผ๊ธฐ (1)
      • ์‹ ๋ฌธ (121)
      • ์Œ์•… (6)
      • ๊ฒฝ์ œ (16)
    • ๐Ÿ’› JavaScript (88)
      • ์ด๋ชจ์ €๋ชจ (4)
      • ๋ฐฑ์ค€ (44)
      • ํ”„๋กœ๊ทธ๋ž˜๋จธ์Šค (40)
      • ๋ฒ„๊ทธ (0)
    • ๐ŸŽญ HTML CSS (6)
      • ํํŠธ๋ฏ€๋ฅด (2)
      • ํฌ์Šค์Šค (4)
    • ๐Ÿ’ป CS (13)
      • ์ž๋ฃŒ๊ตฌ์กฐ ๋ฐ ์•Œ๊ณ ๋ฆฌ์ฆ˜ (1)
      • ๋„คํŠธ์›Œํฌ (9)
      • ์šด์˜์ฒด์ œ (1)
      • ๋ฐ์ดํ„ฐ ๋ฒ ์ด์Šค (0)
      • ๋””์ž์ธ ํŒจํ„ด (1)
    • ๐Ÿ Python (71)
      • ๋ฐฑ์ค€ (67)
      • ํ”„๋กœ๊ทธ๋ž˜๋จธ์Šค (4)
    • ๐Ÿ’ฟ Data (156)
      • ์ด๋ชจ์ €๋ชจ (65)
      • ๋ถ€ํŠธ์บ ํ”„ (89)
      • ๊ทธ๋กœ์Šค ํ•ดํ‚น (2)

๋ธ”๋กœ๊ทธ ๋ฉ”๋‰ด

  • ๐Ÿ”ด ๋ธ”๋กœ๊ทธ(ํ™ˆ)
  • ๐Ÿฑ Github
  • ๊ธ€์“ฐ๊ธฐ
  • ํŽธ์ง‘
hELLO ยท Designed By JSW.
Jayden1116

Jayden`s LifeTrip ๐Ÿ”†

๐Ÿ’ฟ Data/์ด๋ชจ์ €๋ชจ

Dendrogram์„ ํ†ตํ•œ Clustering ์‹œ๊ฐํ™” ๋ฐ Elbow Method

2021. 12. 6. 21:43

1. ์ •๊ทœํ™”๋ถ€ํ„ฐ!(๊ฐ ๋ณ€์ˆ˜์˜ ๊ธฐ์ค€์„ ๋งž์ถ”๊ธฐ ์œ„ํ•ด ์ •๊ทœํ™” ์ž‘์—…์„ ํ•ด์คฌ์Šต๋‹ˆ๋‹ค.)

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
Z = scaler.fit_transform(df)
Z

2-1. Hierarchical Clustering ๋ฐ Dendrogram์„ ํ†ตํ•œ ์‹œ๊ฐํ™”

import numpy as np

from matplotlib import pyplot as plt
from scipy.cluster.hierarchy import linkage, dendrogram
from sklearn.cluster import AgglomerativeClustering

Z = linkage(Z, method='ward', metric='euclidean') # Z์— ๋Œ€ํ•ด linkage matrix ์ƒ์„ฑ
dendrogram(Z, p=2, truncate_mode='lastp') # ๊ตฐ์ง‘๊ฐœ์ˆ˜ 2๊ฐœ๋กœ ํ•ด๋ณด๊ธฐ

image

dendrogram(Z, p=5, truncate_mode='lastp') # ๊ตฐ์ง‘๊ฐœ์ˆ˜ 5๊ฐœ

image

dendrogram(Z, p=569, truncate_mode='lastp') # ๊ทน๋‹จ์ ์œผ๋กœ ๋ฐ์ดํ„ฐ ๊ฐฏ์ˆ˜๋งŒํผ 569๊ฐœ ๊ตฐ์ง‘

image

๋Œ€๋žต ๊ตฐ์ง‘์„ 2๊ฐœ๋กœ ํ–ˆ์„ ๋•Œ์˜ y๊ฐ’์ด ๊ฐ€์žฅ ํฐ ์ฐจ์ด๋ฅผ ๋ณด์ด๋Š” ๊ฒƒ์œผ๋กœ ์ถ”์ธก๋ฉ๋‹ˆ๋‹ค.


๋ด๋“œ๋กœ๊ทธ๋žจ์„ ํ†ตํ•ด ๋Œ€๋žต์ ์œผ๋กœ ๊ตฐ์ง‘ 2๊ฐœ๊ฐ€ ํšจ์œจ์ ์ผ ๊ฒƒ์ด๋ผ ์ถ”์ธก, ๋”ฐ๋ผ์„œ K-means๋ฅผ ์ ์šฉํ•˜์—ฌ elbow method๋ฅผ ํ†ตํ•œ ์ ์ ˆํ•œ ๊ตฐ์ง‘ ๊ฐœ์ˆ˜๋ฅผ ํ™•์ธํ•ด๋ณด๋ คํ•ฉ๋‹ˆ๋‹ค.(์‹œ๊ฐํ™”์— ํŽธํ•˜๊ฒŒ PCA(2)๋กœ ์ ์šฉํ•˜์—ฌ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

์œ„์˜ 1๋ฒˆ ์ดํ›„ ์ƒˆ๋กœ์šด 2-2. PCA๋ฅผ ํ†ตํ•ด ๋ณ€์ˆ˜๋ฅผ 2๊ฐœ๋กœ ์ค„์ด๊ณ  ์‹œ๊ฐํ™”

from sklearn.decomposition import PCA
pca = PCA(2)

pca.fit(Z)

B = pca.transform(Z)

pc1 = B.T[0]
pc2 = B.T[1] # B๋ฅผ ํ†ตํ•ด ์–ป์€ array๋ฅผ ๋’ค์ง‘์–ด์„œ ๊ฐ๊ฐ ์˜ series๋ฅผ pc1๊ณผ pc2๋กœ ๋‘์—ˆ์Šต๋‹ˆ๋‹ค.

plt.scatter(pc1, pc2)
plt.show()

image

3. Kmeans์˜ elbow method ์‹œ๊ฐํ™”ํ•ด๋ณด๊ธฐ

B_df1 = pd.DataFrame(B, columns=['pc1','pc2']) # ์œ„์˜ B๋ฅผ dataframeํ™”

sum_of_squared_distances = []
K = range(1, 15)
for k in K:
    km = KMeans(n_clusters = k)
    km = km.fit(B_df1)
    sum_of_squared_distances.append(km.inertia_)

plt.plot(K, sum_of_squared_distances, 'bx-')
plt.xlabel('k')
plt.ylabel('Sum_of_squared_distances')
plt.title('Elbow Method For Optimal k')
plt.show()

image

๋‹ค์†Œ ์• ๋งคํ•˜๊ธด ํ•˜์ง€๋งŒ k=2์ผ ๋•Œ๊ฐ€ ๊ธ‰๊ฒฉํ•˜๊ฒŒ ์ค„์–ด๋“œ๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

'๐Ÿ’ฟ Data > ์ด๋ชจ์ €๋ชจ' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

๋ฒกํ„ฐ ๋‚ด์  ๋ฐ projection  (0) 2021.12.07
Scree Plot ํ™œ์šฉ๋ฒ•  (0) 2021.12.07
Clustering(๊ตฐ์ง‘ํ™”)  (0) 2021.12.06
Dimension Reduction(์ฐจ์› ์ถ•์†Œ)  (0) 2021.12.04
Linear Algebra + (Cov ;๊ณต๋ถ„์‚ฐ, Cor ; ์ƒ๊ด€๊ณ„์ˆ˜)  (0) 2021.12.02
    '๐Ÿ’ฟ Data/์ด๋ชจ์ €๋ชจ' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€
    • ๋ฒกํ„ฐ ๋‚ด์  ๋ฐ projection
    • Scree Plot ํ™œ์šฉ๋ฒ•
    • Clustering(๊ตฐ์ง‘ํ™”)
    • Dimension Reduction(์ฐจ์› ์ถ•์†Œ)
    Jayden1116
    Jayden1116
    ์•„๋งˆ๋„ ํ•œ๋ฒˆ ๋ฟ์ธ ์ธ์ƒ์„ ์—ฌํ–‰ ์ค‘์ธ Jayden์˜ ์ผ์ง€๐Ÿ„๐ŸŒŠ

    ํ‹ฐ์Šคํ† ๋ฆฌํˆด๋ฐ”