Jayden1116
Jayden`s LifeTrip ๐Ÿ”†
Jayden1116
์ „์ฒด ๋ฐฉ๋ฌธ์ž
์˜ค๋Š˜
์–ด์ œ
  • Jayden`s (481)
    • ๐Ÿฏ Hello, Jayden (144)
      • ์ผ๊ธฐ (1)
      • ์‹ ๋ฌธ (121)
      • ์Œ์•… (6)
      • ๊ฒฝ์ œ (16)
    • ๐Ÿ’› JavaScript (88)
      • ์ด๋ชจ์ €๋ชจ (4)
      • ๋ฐฑ์ค€ (44)
      • ํ”„๋กœ๊ทธ๋ž˜๋จธ์Šค (40)
      • ๋ฒ„๊ทธ (0)
    • ๐ŸŽญ HTML CSS (6)
      • ํํŠธ๋ฏ€๋ฅด (2)
      • ํฌ์Šค์Šค (4)
    • ๐Ÿ’ป CS (13)
      • ์ž๋ฃŒ๊ตฌ์กฐ ๋ฐ ์•Œ๊ณ ๋ฆฌ์ฆ˜ (1)
      • ๋„คํŠธ์›Œํฌ (9)
      • ์šด์˜์ฒด์ œ (1)
      • ๋ฐ์ดํ„ฐ ๋ฒ ์ด์Šค (0)
      • ๋””์ž์ธ ํŒจํ„ด (1)
    • ๐Ÿ Python (71)
      • ๋ฐฑ์ค€ (67)
      • ํ”„๋กœ๊ทธ๋ž˜๋จธ์Šค (4)
    • ๐Ÿ’ฟ Data (156)
      • ์ด๋ชจ์ €๋ชจ (65)
      • ๋ถ€ํŠธ์บ ํ”„ (89)
      • ๊ทธ๋กœ์Šค ํ•ดํ‚น (2)

๋ธ”๋กœ๊ทธ ๋ฉ”๋‰ด

  • ๐Ÿ”ด ๋ธ”๋กœ๊ทธ(ํ™ˆ)
  • ๐Ÿฑ Github
  • ๊ธ€์“ฐ๊ธฐ
  • ํŽธ์ง‘
hELLO ยท Designed By JSW.
Jayden1116

Jayden`s LifeTrip ๐Ÿ”†

๐Ÿ’ฟ Data/๋ถ€ํŠธ์บ ํ”„

[TIL]13.High Dimensional Data

2021. 12. 5. 14:34

๋ชฉํ‘œ

  • Vector Transformation ์ดํ•ด
  • Eigenvector / Eigenvalue์— ๋Œ€ํ•œ ์ดํ•ด
  • ๋ฐ์ดํ„ฐ์˜ feature ์ˆ˜(์ฐจ์› ์ˆ˜)๊ฐ€ ๋Š˜์–ด๋‚˜๋ฉด ์ƒ๊ธฐ๋Š” ๋ฌธ์ œ์  ๋ฐ ์ด๋ฅผ handlingํ•˜๊ธฐ ์œ„ํ•œ ๋ฐฉ๋ฒ•
  • PCA์˜ ๊ธฐ๋ณธ ์›๋ฆฌ์™€ ๋ชฉ์ ์— ๋Œ€ํ•œ ์ดํ•ด

Vector transformation

  • R^2 ๊ณต๊ฐ„์—์„œ ๋ฒกํ„ฐ๋ฅผ ๋ณ€ํ™˜ ์ฆ‰, ์„ ํ˜• ๋ณ€ํ™˜์€ ์ž„์˜์˜ ๋‘ ๋ฒกํ„ฐ๋ฅผ ๋”ํ•˜๊ฑฐ๋‚˜ ํ˜น์€ ์Šค์นผ๋ผ๊ฐ’์„ ๊ณฑํ•˜๋Š” ๊ฒƒ
    $$T(u+v)=T(u)+T(v)$$ $$T(cu)=cT(u)$$

๋ฒกํ„ฐ๋ณ€ํ™˜์œผ๋กœ์„œ์˜ '๋งคํŠธ๋ฆญ์Šค์™€ ๋ฒกํ„ฐ์˜ ๊ณฑ'

  • f๋ผ๋Š” transformation์„ ์‚ฌ์šฉํ•˜์—ฌ ์ž„์˜์˜ ๋ฒกํ„ฐ [x1, x2]์— ๋Œ€ํ•ด [2x1 + x2, x1 - 3x2]๋กœ ๋ณ€ํ™˜์„ ํ•œ๋‹ค.

\begin{align}
f(\begin{bmatrix}x_1 \\ x_2 \end{bmatrix}) = \begin{bmatrix} 2x_1 + x_2 \\ x_1 -3x_2 \ \end{bmatrix}
\end{align}

$$\begin{align} T = \begin{bmatrix} 2 & 1 \\ 1 & -3 \end{bmatrix} \end{align}$$

์œ„์™€ ๊ฐ™์€ ๋งคํŠธ๋ฆญ์Šค T๋ฅผ ๊ณฑํ•˜๋Š” ๊ฒƒ๊ณผ ๊ฐ™์€ ์˜๋ฏธ์ด๋‹ค.

์ฆ‰, ์ž„์˜์˜ R^2 ๋ฒกํ„ฐ๋ฅผ ๋‹ค๋ฅธ ๋ฒกํ„ฐ๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๊ณผ์ •์€ ํŠน์ • T๋ผ๋Š” ๋งคํŠธ๋ฆญ์Šค๋ฅผ ๊ณฑํ•˜๋Š” ๊ฒƒ๊ณผ ๋™์ผํ•œ ๊ณผ์ •
์˜ˆ๋ฅผ ๋“ค์–ด
\begin{align}
\begin{bmatrix} 2 & 1 \\ 1 & -3 \end{bmatrix}\begin{bmatrix} 3 \\ 4 \end{bmatrix} = \begin{bmatrix} 10 \\ -9 \end{bmatrix}
\end{align}
ํ•œ๋ฒˆ ๋” ์ƒ๊ฐํ•ด๋ณด๋ฉด, ์œ„ ์‹์—์„œ ๋งคํŠธ๋ฆญ์Šค T๋Š” [2 1], [1 -3]์˜ ๋‘ ๋ฒกํ„ฐ๋กœ ์ด๋ฃจ์–ด์ง„ ํ–‰๋ ฌ์ด๊ณ , [3 4]๋ผ๋Š” ๋ฒกํ„ฐ์˜ ๊ธฐ์ €๋ฒกํ„ฐ([1 0], [0 1])๋ฅผ ๋ฐ”๊พธ๋Š” ํ–‰์œ„๋กœ ๋ณผ ์ˆ˜ ์žˆ๋‹ค.
๋ฒกํ„ฐ transformation์€ ์„ ํ˜•(๊ณฑํ•˜๊ณ  ๋”ํ•˜๋Š” ๊ฒƒ์œผ๋กœ๋งŒ ์ด๋ฃจ์–ด์ง„) ๋ณ€ํ™˜์ด๊ธฐ ๋•Œ๋ฌธ์— ๋งคํŠธ๋ฆญ์Šค์™€ ๋ฒกํ„ฐ์˜ ๊ณฑ์œผ๋กœ ํ‘œํ˜„์ด ๋œ๋‹ค.

๊ณ ์œ ๋ฒกํ„ฐ(Eigenvector)

  • ์œ„์—์„œ ๋ดค๋‹ค์‹œํ”ผ Transformation์€ matrix๋ฅผ ๊ณฑํ•จ์œผ๋กœ์จ ๋ฒกํ„ฐ(๋ฐ์ดํ„ฐ)๋ฅผ ๋‹ค๋ฅธ ์œ„์น˜๋กœ ์˜ฎ๊ธฐ๋Š” ๊ฐœ๋…์ด๋‹ค.
  • R^3 ๊ณต๊ฐ„์—์„œ ์˜ˆ์‹œ๋ฅผ ๋“ค์–ด๋ณด์ž๋ฉด

    R^3 ๊ณต๊ฐ„์ด ํšŒ์ „ํ•  ๋•Œ(์ž์ „ํ•  ๋•Œ), ์œ„๋„์— ๋”ฐ๋ผ ์œ„์น˜์˜ ๋ณ€ํ™” ์ •๋„๊ฐ€ ๋‹ค๋ฅด๋‹ค.
  • ํšŒ์ „์ถ•์— ์žˆ๋Š” ๊ฒฝ์šฐ transformation์„ ํ†ตํ•œ ์œ„์น˜๊ฐ€ ๋ณ€ํ•˜์ง€ ์•Š๋Š”๋‹ค.
  • ์ด๋ ‡๊ฒŒ transformation์— ์˜ํ–ฅ์„ ๋ฐ›์ง€ ์•Š๋Š” ํšŒ์ „์ถ•(ํ˜น์€ ๋ฒกํ„ฐ)๋ฅผ ๊ทธ ๊ณต๊ฐ„์˜ ๊ณ ์œ ๋ฒกํ„ฐ(Eigenvector๋ผ๊ณ  ๋ถ€๋ฅธ๋‹ค.

๊ณ ์œ ๊ฐ’(Eigenvalue)

์œ„์—์„œ ๊ณ ์œ ๋ฒกํ„ฐ๋Š” transformation ์‹œ, ๋ฐฉํ–ฅ์€ ๋ณ€ํ•˜์ง€ ์•Š๊ณ  ํฌ๊ธฐ๋งŒ ๋ฐ”๋€Œ๋Š” ๋ฒกํ„ฐ์˜€๋Š”๋ฐ, ์ด ๋•Œ ๋ณ€ํ•˜๋Š” ํฌ๊ธฐ์˜ ์ •๋„๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ๊ฐ’์ด Eigenvalue๋กœ ๊ณ ์œ ๊ฐ’์ด๋ผ ๋ถ€๋ฅธ๋‹ค.(์–ผ๋งˆ๋‚˜ ๋ณ€ํ–ˆ๋А๋ƒ)

$$T \cdot v = v' = \lambda \cdot v $$
\begin{align} \begin{bmatrix} a & b \\ c & d \end{bmatrix}\begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} ax+by \\ cx+dy \end{bmatrix} = \lambda \begin{bmatrix} x \\ y \end{bmatrix} \end{align}

์˜ˆ๋ฅผ ๋“ค๋ฉด,

\begin{align} \begin{bmatrix} 4 & 2 \\ 2 & 4 \end{bmatrix}\begin{bmatrix} 3 \\ -3 \end{bmatrix} = \begin{bmatrix} 6 \\ -6 \end{bmatrix} = 2 \begin{bmatrix} 3 \\ -3 \end{bmatrix} \end{align}

๋งคํŠธ๋ฆญ์Šค๋ฅผ ๊ณฑํ•œ ๊ฒƒ์ด ์ƒ์ˆ˜ 2 ๋ฅผ ๊ณฑํ•œ ๊ฒƒ๊ณผ ๊ฐ™์€ ํšจ๊ณผ์ด๋‹ค. ์ฆ‰ [3 -3] ๋ฒกํ„ฐ๋Š” ์„ ํ˜•๋ณ€ํ™˜ ์‹œ ๋ฐฉํ–ฅ์€ ์œ ์ง€ํ•˜๊ณ  ํฌ๊ธฐ๋งŒ ๋ฐ”๋€Œ๋Š” ๊ฒƒ

๊ณ ์œ ๊ฐ’ ๊ณ„์‚ฐ

$$T \cdot v = \lambda \cdot v $$
์—์„œ ์ขŒ๋ณ€์œผ๋กœ ์˜ฎ๊ธด ํ›„ det()=0์ด ๋˜๊ฒŒ๋” ํ•˜์—ฌ ๋žŒ๋‹ค๋ฅผ ๊ณ„์‚ฐํ•œ๋‹ค.(์—ญํ–‰๋ ฌ์ด ์—†๋Š” ์กฐ๊ฑด)

๊ณ ์œ ๊ฐ’์„ ๋ฐฐ์šฐ๋Š” ์ด์œ 

  • vector teansformation์€ ๊ฒฐ๊ตญ '๋ฐ์ดํ„ฐ๋ฅผ ๋ณ€ํ™˜ํ•œ๋‹ค.'๋ผ๋Š” ๋ชฉ์ ์˜ ๋‹จ๊ณ„ ์ค‘ ํ•˜๋‚˜์ด๋‹ค.

์ฐจ์›์˜ ์ €์ฃผ(๊ณ ์ฐจ์›์˜ ๋ฌธ์ œ ; The Curse of Dimensionality)

  • ํ”ผ์ณ์˜ ์ˆ˜(์ฐจ์›์˜ ์ˆ˜)๊ฐ€ ๋งŽ์„์ˆ˜๋ก ๋ฐ์ดํ„ฐ์…‹์„ ๋ชจ๋ธ๋งํ•˜๊ฑฐ๋‚˜ ๋ถ„์„ํ•  ๋•Œ ๋ฐœ์ƒํ•˜๋Š” ์—ฌ๋Ÿฌ ๋ฌธ์ œ์ ๋“ค
  • ๋˜ํ•œ, ์ฐจ์›์ด ์ฆ๊ฐ€ํ• ์ˆ˜๋ก ์‚ฌ๋žŒ์ด ์ง๊ด€์ ์œผ๋กœ ์ดํ•ดํ•˜๊ธฐ(์‹œ๊ฐํ™”ํ•˜๊ธฐ) ์–ด๋ ต๋‹ค.
  • ๋ฐ์ดํ„ฐ์…‹์—์„œ ์ธ์‚ฌ์ดํŠธ๋ฅผ ์ฐพ๊ธฐ ์œ„ํ•ด ์“ฐ์ด๋Š” ๋ชจ๋“  feature๊ฐ€ ๋™์ผํ•˜๊ฒŒ ์ค‘์š”ํ•˜์ง€๋Š” ์•Š๋‹ค.
  • ๋ฐ์ดํ„ฐ๋ฅผ ์ผ๋ถ€ ์ œํ•œํ•ด๋„, ์˜๋ฏธ ํŒŒ์•…์—๋Š”(์šฐ๋ฆฌ๊ฐ€ ์–ป์œผ๋ ค๋Š” ์ธ์‚ฌ์ดํŠธ์—๋Š”) ํฐ ์ฐจ์ด๊ฐ€ ์—†๋‹ค๋ฉด ๋„ˆ๋ฌด ๋งŽ์€ input์€ ์˜คํžˆ๋ ค ๋ฐฉํ•ด๊ฐ€ ๋  ์ˆ˜ ์žˆ๋‹ค.
  • ๋˜๋‹ค๋ฅธ ๋ฌธ์ œ๋กœ '์˜ค๋ฒ„ํ”ผํŒ…'(๊ณผ์ ํ•ฉ)์˜ ๋ฌธ์ œ๊ฐ€ ์žˆ๋‹ค. training data์—๋งŒ ๋„ˆ๋ฌด ๋งŽ์€ ํ•™์Šต์ด ๋˜์–ด๋ฒ„๋ฆฌ๋ฉด ์˜คํžˆ๋ ค test data์—์„œ ์ œ๋Œ€๋กœ ๋œ ๊ฒฐ๊ณผ๊ฐ€ ๋‚˜์˜ค๊ธฐ ์–ด๋ ค์›Œ์ง„๋‹ค.

Dimension Reduction(์ฐจ์› ์ถ•์†Œ)

  • ์œ„์˜ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ ๋ฐฉ์•ˆ์œผ๋กœ, ์ ์ ˆํžˆ ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•˜์—ฌ ์ถฉ๋ถ„ํ•œ ์˜๋ฏธ๋ฅผ ๋‹ด๊ฒŒ ํ•  ์ˆ˜๋Š” ์—†์„๊นŒ?

Feature Selection(๋ณ€์ˆ˜ ์„ ํƒ)

  • ๋ณ€์ˆ˜(์ฐจ์›)๊ฐ€ 100๊ฐœ ์žˆ์„ ๋•Œ, ๋ฐ์ดํ„ฐ์…‹ ์ค‘ ๊ฐ€์žฅ ๋ถ„์‚ฐ์ด ํฐ(๋ถ„ํฌ๊ฐ€ ๋„“์€ ; ๋‹ค์–‘ํ•œ ์ •๋ณด๋ฅผ ๋‹ด๊ณ  ์žˆ๋Š”) ๋ณ€์ˆ˜๋งŒ ์„ ํƒํ•˜๊ณ  ๋‚˜๋จธ์ง€๋Š” ์ œ์™ธํ•˜๋Š” ๋ฐฉ๋ฒ•
  • ์žฅ์  : ์„ ํƒ๋œ ๋ณ€์ˆ˜์˜ ํ•ด์„์ด ์‰ฝ๋‹ค.(์ง๊ด€์ )
  • ๋‹จ์  : ๋ณ€์ˆ˜๋“ค ์‚ฌ์ด์˜ ์—ฐ๊ด€์„ฑ์„ ๊ณ ๋ คํ•ด์„œ ์„ ํƒํ•ด์•ผํ•œ๋‹ค.(์ค‘๋ณต ์•ˆ๋˜๊ฒŒ๋”)
  • ex) LASSD, Genetic algorithm ๋“ฑ

Feature Extraction(๋ณ€์ˆ˜ ์ถ”์ถœ)

  • ๊ธฐ์กด์— ์žˆ๋Š” ๋ณ€์ˆ˜(์ฐจ์›)๋ฅผ ์กฐํ•ฉํ•˜์—ฌ ์ƒˆ๋กœ์šด ์„ค๋ช…๋ ฅ์ด ์ข‹์€ ๋ณ€์ˆ˜๋ฅผ ๋งŒ๋“ค์–ด ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•
  • ์žฅ์  : ๋ณ€์ˆ˜๋“ค๊ฐ„์˜ ์—ฐ๊ด€์„ฑ์ด ๊ณ ๋ ค๋œ๋‹ค. ๋ณ€์ˆ˜์˜ ์ˆ˜๋ฅผ ๋งŽ์ด ์ค„์ผ ์ˆ˜ ์žˆ๋‹ค.
  • ๋‹จ์  : ๋งŒ๋“ค์–ด์ง„ ๋ณ€์ˆ˜๋ฅผ ํ•ด์„ํ•˜๊ธฐ ์–ด๋ ค์›€(๊ทธ ๋ณ€์ˆ˜๊ฐ€ ์ •ํ™•ํžˆ ๋ฌด์—‡์„ ์˜๋ฏธํ•˜๋Š”์ง€ ํŒŒ์•…ํ•˜๋Š” ๊ฒŒ ์–ด๋ ต๋‹ค.)
  • ex) PCA, Auto-encoder ๋“ฑ

PCA(Principal Component Analysis; ์ฃผ์„ฑ๋ถ„ ๋ถ„์„)

  • ๊ณ ์ฐจ์› ๋ฐ์ดํ„ฐ๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ๋ถ„์„ํ•˜๊ธฐ ์œ„ํ•œ ๊ธฐ๋ฒ•
  • ๋‚ฎ์€ ์ฐจ์›์œผ๋กœ ์ฐจ์› ์ถ•์†Œ
  • ๊ณ ์ฐจ์› ๋ฐ์ดํ„ฐ์˜ ์ •๋ณด(๋ถ„์‚ฐ)์„ ์ตœ๋Œ€ํ•œ ์œ ์ง€ํ•˜๋Š” ๋ฒกํ„ฐ(๋ณ€์ˆ˜)๋ฅผ ์ฐพ๊ณ  ํ•ด๋‹น ๋ฒกํ„ฐ์— ๋Œ€ํ•ด ๋ฐ์ดํ„ฐ๋ฅผ (Linear) Projection
  • ๋ฐ์ดํ„ฐ์˜ ๋ถ„์‚ฐ == ์ •๋ณด* ๊ธฐ์–ตํ•˜๊ธฐ
import pandas as pd
import matplotlib.pyplot as plt

x = \[-2.2, -2, -2, -1, -1, 0, 0, 1, 1, 2, 2, 2.2\]  
y = \[0, .5, -.5, .8, -.8, .9, -.9, .8, -.8, .5, -.5, 0\]

df = pd.DataFrame({"x": x, "y": y})

print('variance of X : ' + str(np.var(x)))  
print('variance of Y : ' + str(np.var(y)))

plt.scatter(df\['x'\], df\['y'\])  
plt.arrow(-3, 0, 6, 0, head\_width = .05, head\_length = .05, color ='#d63031')  
plt.arrow(0, -1, 0, 6, head\_width = .05, head\_length = .05, color ='#00b894');

variance of X : 2.473333333333333
variance of Y : 0.4316666666666668

์ด ๋•Œ, ์šฐ๋ฆฌ๋Š” ๋ฐ์ดํ„ฐ์˜ ๋ถ„์‚ฐ(์ •๋ณด)์„ ๊ฐ€์žฅ ์ž˜ ๋‹ด๊ณ  ์žˆ๋Š” ๋นจ๊ฐ„์ƒ‰ ์ถ•์— ์šฐ์„ ์ ์œผ๋กœ projectionํ•ด์•ผํ•œ๋‹ค. ๊ทธ๊ฒŒ ์ฒซ๋ฒˆ์งธ ์ฃผ์„ฑ๋ถ„ pc1์ด ๋œ๋‹ค.

PCA process

  1. ๋ฐ์ดํ„ฐ ์ค€๋น„
import numpy as np

X = np.array([ 
              [0.2, 5.6, 3.56], 
              [0.45, 5.89, 2.4],
              [0.33, 6.37, 1.95],
              [0.54, 7.9, 1.32],
              [0.77, 7.87, 0.98]
])
print("Data: ", X)

Data: [[0.2 5.6 3.56]
[0.45 5.89 2.4 ]
[0.33 6.37 1.95]
[0.54 7.9 1.32]
[0.77 7.87 0.98]]

 np.mean(X, axis=0) # ์ด๋ ‡๊ฒŒ ํ•˜๋ฉด ์—ด๋งˆ๋‹ค(์นผ๋Ÿผ๋งˆ๋‹ค, ๋ณ€์ˆ˜๋งˆ๋‹ค)์˜ ํ‰๊ท  ๋ฐ ๋ถ„์‚ฐ๊ฐ’๋“ค์„ ๊ตฌํ•ด์ค€๋‹ค.
  1. ๊ฐ ์—ด์— ๋Œ€ํ•ด ํ‰๊ท ์„ ๋นผ๊ณ , ํ‘œ์ค€ํŽธ์ฐจ๋กœ ๋‚˜๋ˆ„์–ด Normalize(์ •๊ทœํ™”)ํ•œ๋‹ค. ์Šค์ผ€์ผ์„ ๋งž์ถ”๋Š” ์ž‘์—…
  2. standardized_data = ( X - np.mean(X, axis = 0) ) / np.std(X, ddof = 1, axis = 0) print("\n Standardized Data: \n", standardized_data)

Standardized Data:
[[-1.19298785 -1.0299848 1.5011907 ]
[-0.03699187 -0.76471341 0.35403575]
[-0.59186994 -0.32564351 -0.09098125]
[ 0.37916668 1.07389179 -0.71400506]
[ 1.44268298 1.04644992 -1.05024014]]

  1. ์ •๊ทœํ™” ์ง„ํ–‰ํ•œ ๊ฐ’'์˜ ๋ถ„์‚ฐ-๊ณต๋ถ„์‚ฐ ๋งคํŠธ๋ฆญ์Šค๋ฅผ ๊ณ„์‚ฐ
  2. covariance_matrix = np.cov(standardized_data.T) print("\n Covariance Matrix: \n", covariance_matrix)

Covariance Matrix:
[[ 1. 0.84166641 -0.88401004]
[ 0.84166641 1. -0.91327498]
[-0.88401004 -0.91327498 1. ]]

  1. ๋ถ„์‚ฐ-๊ณต๋ถ„์‚ฐ ๋งคํŠธ๋ฆญ์Šค์˜ ๊ณ ์œ ๊ฐ’๊ณผ ๊ณ ์œ ๋ฒกํ„ฐ๋ฅผ ๊ณ„์‚ฐํ•œ๋‹ค.(๋ถ„์‚ฐ, ์ฆ‰ ์ •๋ณด์–‘์„ ๊ฐ€์žฅ ํฌ๊ฒŒ ๊ฐ€์ ธ๊ฐ€๋Š” ์„ฑ๋ถ„๋“ค์„ ์ฐพ๊ธฐ ์œ„ํ•ด์„œ)
  2. values, vectors = np.linalg.eig(covariance_matrix) print("\n Eigenvalues: \n", values) print("\n Eigenvectors: \n", vectors)

Eigenvalues:
[2.75962684 0.1618075 0.07856566]

Eigenvectors:
[[ 0.56991376 0.77982119 0.25899269]
[ 0.57650106 -0.60406359 0.55023059]
[-0.58552953 0.16427443 0.7938319 ]]

  1. ๋ฐ์ดํ„ฐ๋ฅผ ๊ณ ์œ  ๋ฒกํ„ฐ์— projectionํ•œ๋‹ค.(matmul)
Z = np.matmul(standardized_data, vectors)
print("\\n Projected Data: \\n", Z)

Projected Data:
[[-2.15267901 -0.06153364 0.31598878]
[-0.66923865 0.4912475 -0.14930446]
[-0.47177644 -0.27978923 -0.40469283]
[ 1.25326312 -0.47030949 0.12228952]
[ 2.04043099 0.32038486 0.11571899]]

  • ์—ฌ๊ธฐ์„œ ์˜๋ฌธ. ์ฐจ์› ์ถ•์†Œ๋ฅผ ํ–ˆ๋Š”๋ฐ ์™œ ๊ทธ๋Œ€๋กœ ๋ณ€์ˆ˜๊ฐ€ 3๊ฐœ??

$$X = $$

$$x_1$$ $$x_2$$ $$x_3$$
0.2 5.6 3.56
0.45 5.89 2.4
0.33 6.37 1.95
0.54 7.9 1.32
0.77 7.87 0.98

์—์„œ

$$Z = $$

$$pc_1$$ $$pc_2$$ $$pc_3$$
-2.1527 -0.0615 0.3160
-0.6692 0.4912 -0.1493
-0.4718 -0.2798 -0.4047
1.2533 -0.4703 0.1223
2.0404 0.3204 0.1157

๊ฐ€ ๋œ ๊ฒƒ์ด๋‹ค. ์ฆ‰, ๋ฐ์ดํ„ฐ๋ฅผ ์ž˜ ์„ค๋ช…ํ•˜๋Š” ์ถ• 3๊ฐœ์˜ ์ขŒํ‘œ๊ณ„๋กœ ๋ฐ”๊ฟจ๋‹ค๊ณ  ์ƒ๊ฐํ•˜๋ฉด ์ข‹๋‹ค.

์ด์ œ ์ด๋ ‡๊ฒŒ ํ•œ ํ›„, ๊ณ ์œ ๊ฐ’์„ ๋น„๊ตํ•˜์—ฌ pc2๊นŒ์ง€ ๊ฐ€์ ธ๊ฐˆ์ง€ ๋“ฑ์˜ ๊ณ ๋ฏผ์„ ํ•˜๋Š” ๊ฒƒ!

๊ทผ๋ฐ ์ด์ œ ์—ฌ๊ธฐ์„œ sklearn(์‚ฌ์ดํ‚ท๋Ÿฐ)์ด๋ผ๋Š” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋กœ ์œ„์˜ ๊ณผ์ •์„ ์•„์ฃผ ezํ•˜๊ฒŒ ๊ฐ€๋Šฅ(๋ถ€๋“ค๋ถ€๋“ค)

from sklearn.preprocessing import StandardScaler, Normalizer
from sklearn.decomposition import PCA

print("Data: \n", X)

scaler = StandardScaler()
Z = scaler.fit_transform(X)
print("\n Standardized Data: \n", Z)

pca = PCA(2)

pca.fit(Z)

print("\n Eigenvectors: \n", pca.components_)
print("\n Eigenvalues: \n",pca.explained_variance_)

B = pca.transform(Z)
print("\n Projected Data: \n", B)

Data:
[[0.2 5.6 3.56]
[0.45 5.89 2.4 ]
[0.33 6.37 1.95]
[0.54 7.9 1.32]
[0.77 7.87 0.98]]

Standardized Data:
[[-1.33380097 -1.15155802 1.67838223]
[-0.04135817 -0.85497558 0.395824 ]
[-0.66173071 -0.36408051 -0.10172014]
[ 0.42392124 1.20064752 -0.79828193]
[ 1.61296861 1.16996658 -1.17420417]]

Eigenvectors:
[[-0.13020816 -0.73000041 0.67092863]
[-0.08905388 0.68256517 0.72537866]]

Eigenvalues:
[2.15851707 0.09625196]

Projected Data:
[[ 1.87404384 0.35553233]
[ 0.85151446 -0.31022649]
[ 0.21482136 -0.29832914]
[-1.35210803 0.27030569]
[-1.58827163 -0.0172824 ]]

  • ์ค‘๊ฐ„์— standardized data๊ฐ€ ์ด์ „๊ณผ ๋‹ค๋ฅธ ์ด์œ ?
  • standardized_data = ( X - np.mean(X, axis = 0) ) / np.std(X, ddof = 1, axis = 0) print("\n Standardized Data: \n", standardized_data)

์—์„œ standard deviation์— ์“ฐ์ด๋Š” ์ž์œ ๋„๊ฐ€ 0์ด๋ƒ 1์ด๋ƒ์˜ ์ฐจ์ด(๋ชจ์ง‘๋‹จ์ผ ๋• 0, ํ‘œ๋ณธ์ง‘๋‹จ์ผ ๋• 1)

PCA์˜ ํŠน์ง•

  • ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด ๋…๋ฆฝ์ ์ธ ์ถ•์„ ์ฐพ๋Š”๋ฐ ์œ ์šฉํ•˜๋‹ค.
  • ๋ฐ์ดํ„ฐ์˜ ๋ถ„ํฌ๊ฐ€ ์ •๊ทœ์„ฑ์„ ๋„์ง€ ์•Š๋Š” ๊ฒฝ์šฐ ์ ์šฉ์ด ์–ด๋ ต๋‹ค.
    • ๋‹น์—ฐํ•œ ๊ฒŒ, ๋ถ„์‚ฐ์„ ๊ฐ€์žฅ ํฌ๊ฒŒ ๊ฐ–๋Š” ์ƒˆ๋กœ์šด ์ถ•๋“ค์„ ์ฐพ๋Š”๊ฑด๋ฐ ๊ทธ ๋ถ„์‚ฐ ์กฐ์ฐจ ์ฐพ๊ธฐ ์–ด๋ ค์šด ๋ฐ์ดํ„ฐ๋ผ๋ฉด ๋‹น์—ฐํžˆ ์–ด๋ ต๋‹ค.
  • ๋ถ„๋ฅ˜/์˜ˆ์ธก ๋ฌธ์ œ์— ๋Œ€ํ•ด์„œ ๋ฐ์ดํ„ฐ์˜ ๋ผ๋ฒจ์„ ๊ณ ๋ คํ•˜์ง€ ์•Š๊ธฐ์— ํšจ๊ณผ์  ๋ถ„๋ฆฌ๊ฐ€ ์–ด๋ ต๋‹ค.
    • ์ด ๊ฒฝ์šฐ๋Š” PLS ์‚ฌ์šฉ ๊ฐ€๋Šฅ

'๐Ÿ’ฟ Data > ๋ถ€ํŠธ์บ ํ”„' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

[TIL]15.์Šค์ฑŒ3  (0) 2021.12.08
[TIL]14.Clustering(๊ตฐ์ง‘ํ™”)  (0) 2021.12.07
[TIL]12.Linear Algebra +  (0) 2021.12.03
[TIL]11.Vector and Matrix  (0) 2021.12.01
[TIL]10.์Šคํ”„๋ฆฐํŠธ ์ฑŒ๋ฆฐ์ง€  (0) 2021.12.01
    '๐Ÿ’ฟ Data/๋ถ€ํŠธ์บ ํ”„' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€
    • [TIL]15.์Šค์ฑŒ3
    • [TIL]14.Clustering(๊ตฐ์ง‘ํ™”)
    • [TIL]12.Linear Algebra +
    • [TIL]11.Vector and Matrix
    Jayden1116
    Jayden1116
    ์•„๋งˆ๋„ ํ•œ๋ฒˆ ๋ฟ์ธ ์ธ์ƒ์„ ์—ฌํ–‰ ์ค‘์ธ Jayden์˜ ์ผ์ง€๐Ÿ„๐ŸŒŠ

    ํ‹ฐ์Šคํ† ๋ฆฌํˆด๋ฐ”