๐Ÿ’ฟ Data/์ด๋ชจ์ €๋ชจ

    ๋ฒกํ„ฐ ๋‚ด์  ๋ฐ projection

    ์ฃผ์–ด์ง„ ๋ฐ์ดํ„ฐ (x, y)์— ๋Œ€ํ•ด์„œ y = x ๋ผ๋Š” ๋ฒกํ„ฐ์— ๋Œ€ํ•ด projection์„ ๊ณ„์‚ฐํ•˜๋Š” ํ•จ์ˆ˜๋ฅผ ์ž‘์„ฑํ•˜์„ธ์š”. (x, y) ๋Š” (0, 0) ์—์„œ (x, y)๋กœ ๊ฐ€๋Š” ๋ฒกํ„ฐ๋ผ ๊ฐ€์ •ํ•ฉ๋‹ˆ๋‹ค. ์ดํ›„ ์ž…๋ ฅ๋œ ๋ฐ์ดํ„ฐ๋ฅผ ํŒŒ๋ž€์ƒ‰ ์„ ์œผ๋กœ, y = x ๋ผ๋Š” ๋ฒกํ„ฐ๋ฅผ ๋นจ๊ฐ„์ƒ‰ ์„ ์œผ๋กœ, ๋งˆ์ง€๋ง‰์œผ๋กœ projection ๋œ ์„ ์„ ๋…น์ƒ‰ ์ ์„ (dashed)์œผ๋กœ ๊ทธ๋ž˜ํ”„์— ๊ทธ๋ฆฌ์„ธ์š”. y=x์— ํ•ด๋‹นํ•˜๋Š” ์ž„์˜์˜ ๋ฒกํ„ฐ([10, 10])๋ฅผ ์„ค์ •ํ•˜์—ฌ ๋‚ด์  ๋ฐ projection์„ ์ง„ํ–‰ํ•˜์˜€์Šต๋‹ˆ๋‹ค. import numpy as np v = [7, 4] a = [10, 10] # y = x ์ƒ์˜ ์ž„์˜์˜ ๋ฒกํ„ฐ ์„ ์ • # u๋Š” v๋ฅผ y = x ์ƒ์— projectionํ•œ ๋ฒกํ„ฐ def myProjection(v, a): v = np.array(v) a = np..

    Scree Plot ํ™œ์šฉ๋ฒ•

    "Scree Plot" ์— ๋Œ€ํ•ด์„œ ์•Œ์•„๋ณด๊ณ , ์œ„์—์„œ PCA๋กœ ๋งŒ๋“  ๋ฐ์ดํ„ฐ์…‹์„ ์‚ฌ์šฉํ•˜์—ฌ ๋งŒ๋“ค์–ด๋ณด์„ธ์š”. 90%์˜ ๋‚ด์šฉ์„ ์„ค๋ช…ํ•˜๊ธฐ ์œ„ํ•ด์„œ, ๋ช‡๊ฐœ์˜ PC๋ฅผ ์‚ฌ์šฉํ•ด์•ผ ํ•˜๋‚˜์š”? ์œ„์˜ ์—ฌ๋Ÿฌ ๊ณผ์ •์€ ์ƒ๋žตํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. :) ๋จผ์ € ๊ฐ ์ฃผ์„ฑ๋ถ„์— ๋Œ€ํ•œ ์•„์ด๊ฒ๋ฒจ๋ฅ˜๊ฐ’์„ ๋ชจ๋‘ ๋”ํ•˜๊ณ  ๋‚˜๋ˆ , ๊ฐ๊ฐ์˜ proportion์„ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค. values = values / np.sum(values) # ์œ„ ์˜ ๊ฐ’์„ ์‹œ๊ฐํ™” plt.title('Scree plot') plt.xlabel('numberofcomp') plt.ylabel('proposion') plt.plot(values); ๊ฐ๊ฐ์˜ ๊ณ ์œ ๊ฐ’์˜ ๋น„์ค‘์„ ๊ณ„์‚ฐํ•ด๋ด…๋‹ˆ๋‹ค. print(values[:2].sum()) print(values[:3].sum..

    Dendrogram์„ ํ†ตํ•œ Clustering ์‹œ๊ฐํ™” ๋ฐ Elbow Method

    1. ์ •๊ทœํ™”๋ถ€ํ„ฐ!(๊ฐ ๋ณ€์ˆ˜์˜ ๊ธฐ์ค€์„ ๋งž์ถ”๊ธฐ ์œ„ํ•ด ์ •๊ทœํ™” ์ž‘์—…์„ ํ•ด์คฌ์Šต๋‹ˆ๋‹ค.) from sklearn.preprocessing import StandardScaler scaler = StandardScaler() Z = scaler.fit_transform(df) Z 2-1. Hierarchical Clustering ๋ฐ Dendrogram์„ ํ†ตํ•œ ์‹œ๊ฐํ™” import numpy as np from matplotlib import pyplot as plt from scipy.cluster.hierarchy import linkage, dendrogram from sklearn.cluster import AgglomerativeClustering Z = linkage(Z, method='ward&#39..

    Clustering(๊ตฐ์ง‘ํ™”)

    Machine Learning์—์„œ Supervised Learning / Unsupervised Learning / Reinforce Learning 3๊ฐ€์ง€์˜ ์ฐจ์ด๋Š” ๋ฌด์—‡์ผ๊นŒ?(์˜ˆ์‹œ๋„ ํ•จ๊ป˜!) ๋จผ์ € Machine Learning(๊ธฐ๊ณ„ ํ•™์Šต)์ด๋ž€ ์ธ๊ณต์ง€๋Šฅ์˜ ํ•˜์œ„ ์ง‘ํ•ฉ์œผ๋กœ ์ปดํ“จํ„ฐ๊ฐ€ ๋ฐ์ดํ„ฐ๋ฅผ ํ†ตํ•ด ํ•™์Šตํ•˜๊ณ  ๊ฒฝํ—˜์„ ํ†ตํ•ด ๊ฐœ์„ ํ•˜๋„๋ก ํ•™์Šต์‹œํ‚ค๋Š” ๊ฒƒ์„ ๋งํ•œ๋‹ค. ๋จธ์‹ ๋Ÿฌ๋‹์—์„œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ์—์„œ ํŒจํ„ด๊ณผ ์ƒ๊ด€๊ด€๊ณ„ ๋“ฑ์˜ ๋ถ„์„์„ ํ† ๋Œ€๋กœ ์ตœ์ ์˜ ์˜์‚ฌ๊ฒฐ์ •๊ณผ ์˜ˆ์ธก์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๊ฒƒ์— ์ดˆ์ ์„ ๋งž์ถ˜๋‹ค. Supervised Learning(์ง€๋„ํ•™์Šต) : ์ •๋‹ต์ด ์žˆ๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•ด ๋ฐ์ดํ„ฐ๋ฅผ ํ•™์Šต์‹œํ‚ค๋Š” ๋ฐฉ๋ฒ•. ์ž…๋ ฅ๊ฐ’์ด ์ฃผ์–ด์ง€๋ฉด ์ž…๋ ฅ๊ฐ’์— ๋Œ€ํ•œ Label๋„ ์ฃผ์–ด ํ•™์Šต์‹œํ‚ค๋Š” ๊ฒƒ์œผ๋กœ ๊ทธ ์ข…๋ฅ˜์—๋Š” ๋ถ„๋ฅ˜, ํšŒ๊ท€ ๋“ฑ์ด ์žˆ๋‹ค. ์˜ˆ์‹œ) ๊ฐ•์•„์ง€ ์‚ฌ์ง„..

    Dimension Reduction(์ฐจ์› ์ถ•์†Œ)

    'Dimension์ด ์ปค์ง„๋‹ค.'์˜ ์˜๋ฏธ๋Š” ๋ฌด์—‡์ผ๊นŒ์š”? ์ฐจ์›์ด ๋Š˜์–ด๋‚œ๋‹ค. -> ๋ณ€์ˆ˜๊ฐ€ ๋Š˜์–ด๋‚œ๋‹ค. -> matrix(ํ˜น์€ dataframe)์—์„œ ์—ด(์นผ๋Ÿผ)์˜ ์ˆ˜๊ฐ€ ๋Š˜์–ด๋‚œ๋‹ค. ์ฐจ์› ์ถ•์†Œ(Dimensionality Reduction)๋ฅผ ํ•˜๋Š” ์ด์œ  ํšจ์œจ์„ฑ์ด ๋–จ์–ด์ง€๋Š” ๋ณ€์ˆ˜๋ฅผ ์ค„์ด๊ธฐ ์œ„ํ•ด์„œ(input ๋Œ€๋น„ output์ด ์ข‹์ง€ ๋ชปํ•œ ๊ฒฝ์šฐ ; ์„ค๋ช…๋ ฅ์ด ๋–จ์–ด์ง€๋Š” ๊ฒฝ์šฐ) Dimension์ด ์ปค์กŒ์„ ๋•Œ ์–ด๋–ค ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ• ๊นŒ? ๋‹จ์ˆœํ•˜๊ฒŒ ์ƒ๊ฐํ•˜๋ฉด, ์ธ๊ฐ„์ด ์ดํ•ดํ•˜๊ธฐ(๋น„์Šทํ•˜๊ฒŒ ๋‹ค๋ฅธ ์˜๋ฏธ๋กœ๋Š” ์‹œ๊ฐํ™”ํ•˜๊ธฐ) ์–ด๋ ค์›Œ์ง„๋‹ค. ์ฆ‰, ์ง๊ด€์ ์œผ๋กœ ์ดํ•ด๊ฐ€ ๋˜์ง€ ์•Š๋Š”๋‹ค๋Š” ๊ฒƒ ๋˜ํ•œ, ์„ค๋ช…๋ ฅ์ด ๋†’์€ ์ฐจ์›(๋ณ€์ˆ˜, ์นผ๋Ÿผ)๋“ค๋งŒ ์žˆ๋‹ค๋ฉด ํฐ ๋ฌธ์ œ๊ฐ€ ์•„๋‹ ์ˆ˜ ์žˆ์ง€๋งŒ ์„ค๋ช…๋ ฅ์ด ๋–จ์–ด์ง€๋Š” ์ฐจ์›์˜ ๊ฒฝ์šฐ ์“ธ๋ฐ์—†์ด ์ปดํ“จํ„ฐ ๊ณ„์‚ฐ ๊ณผ์ •์ด๋‚˜ ๋ฉ”๋ชจ๋ฆฌ๋งŒ ๋‚ญ๋น„ํ•˜๋Š” ๊ผด์ด ๋ ..