๐Ÿ’ฟ Data/๋ถ€ํŠธ์บ ํ”„

[TIL] 83. Image Segmentation, Object Detection/Recognition

Jayden1116 2022. 3. 12. 20:06

ํ‚ค์›Œ๋“œ

  • Segmentation(Semantic / Instance)
  • Transpose Convolution
  • Object Detection/Recognition

Image Segmentation

  • ํ•˜๋‚˜์˜ ์ด๋ฏธ์ง€์—์„œ ๊ฐ™์€ ์˜๋ฏธ๋ฅผ ๊ฐ€์ง„ ๋ฌผ์ฒด๋ฅผ ๋‹จ์œ„๋กœ ๊ตฌ๋ถ„ํ•ด๋‚ด๋Š” Task

image

  • ์ด๋ฏธ์ง€ ๋ถ„๋ฅ˜ : ์ด๋ฏธ์ง€ ์ž์ฒด๋ฅผ ํ•˜๋‚˜์˜ label๋กœ ์˜ˆ์ธก(๋‚˜๋ฌด ์‚ฌ์ง„์„ ๋‚˜๋ฌด๋กœ ์˜ˆ์ธก)
  • ์ด๋ฏธ์ง€ ๋ถ„ํ•  : ์ด๋ฏธ์ง€ ๋‚ด์— ์—ฌ๋Ÿฌ ์‚ฌ๋ฌผ๋“ค์„ ์˜๋ฏธ์žˆ๋Š” ๋‹จ์œ„๋กœ ๊ตฌ๋ถ„ -> ํ”ฝ์…€ ๋‹จ์œ„๋กœ label ์˜ˆ์ธก

[Segmentation] Semantic VS (semantic) Instance

  • Semantic : ์œ„์—์„œ์™€ ๊ฐ™์ด ์˜๋ฏธ์žˆ๋Š” ๋‹จ์œ„๋กœ ๋ฌผ์ฒด๋ฅผ ๊ตฌ๋ถ„ ex) ์‚ฌ๋žŒ -> ์‚ฌ๋žŒ, ๊ฐ•์•„์ง€ -> ๊ฐ•์•„์ง€
  • Semantic Instance : ๊ฐ ๊ฐœ์ฒด ๋ณ„๋กœ ๊ตฌ๋ถ„ ex) ์‚ฌ๋žŒ1, ์‚ฌ๋žŒ2, ์‚ฌ๋žŒ3 (์ฆ‰, ์‚ฌ๋žŒ์ด์–ด๋„ ๋‹ค ๋‹ค๋ฅธ ์‚ฌ๋žŒ์„ ๊ตฌ๋ถ„)

image

Segmentation Model

[Model] FCN(Fully Convolutional Networks)

  • ์ด๋ฏธ์ง€ ๋ถ„๋ฅ˜๋ฅผ ์œ„ํ•œ CCN์—์„œ ๋ถ„๋ฅ˜๊ธฐ ๋ถ€๋ถ„(์™„์ „ ์—ฐ๊ฒฐ ์‹ ๊ฒฝ๋ง ๋ถ€๋ถ„)์„ ํ•ฉ์„ฑ๊ณฑ ์ธต(Convolutional Layer)๋กœ ๋Œ€์ฒดํ•œ ๋ชจ๋ธ
  • ์ด๋ฏธ์ง€ ๋ถ„ํ•  ํŠน์„ฑ ์ƒ, ์ด๋ฏธ์ง€์˜ ํ”ฝ์…€ ๋‹จ์œ„๋กœ ๋ถ„๋ฅ˜๊ฐ€ ์ด๋ฃจ์–ด์ง€๊ธฐ์— ํ”ฝ์…€์˜ ์œ„์น˜ ์ •๋ณด๋ฅผ ๋๊นŒ์ง€ ๋ณด์กดํ•ด์•ผํ•ฉ๋‹ˆ๋‹ค.
  • ์•„๋ž˜ ๊ทธ๋ฆผ์—์„œ ์ด๋ฏธ์ง€์˜ ํฌ๊ธฐ๊ฐ€ ์ปค์ง€๋Š” ๋ถ€๋ถ„์ด ํ”ฝ์…€์˜ ์œ„์น˜ ์ •๋ณด๋ฅผ ๋ณด์กดํ•˜๊ธฐ ์œ„ํ•ด input๊ณผ ๋น„์Šทํ•œ ํฌ๊ธฐ๋กœ ํ‚ค์›Œ์ฃผ๋Š” ๊ณผ์ •์œผ๋กœ ์ด๋ฅผ Upsampling์ด๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

image

[FCN] Upsampling

  • CNN์—์„œ Convolution๊ณผ Pooling์„ ํ†ตํ•ด ์ด๋ฏธ์ง€์˜ ํŠน์ง•์„ ์ถ”์ถœํ•˜๋Š” ๊ณผ์ •์„ Downsampling์ด๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค.
  • ์ด์™€ ๋ฐ˜๋Œ€๋กœ FCN์—์„œ ํŠน์ง• ์ถ”์ถœ ํ›„ ์ด๋ฏธ์ง€๋ฅผ ๋‹ค์‹œ ์›๋ž˜ ํฌ๊ธฐ๋กœ ๋ณต์›ํ•˜๋Š” ๊ณผ์ •์„ Upsampling์ด๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค.
  • Upsampling์—๋Š” Transpose Convolution๊ณผ Unpooling๊ณผ ๊ฐ™์€ ๋ฐฉ๋ฒ•์ด ์žˆ์Šต๋‹ˆ๋‹ค. Convolution๊ณผ Pooling์˜ ๊ณผ์ •์„ ๊ฑฐ๊พธ๋กœ ํ•ฉ๋‹ˆ๋‹ค.

[Model] U-net

  • ์ด๋ฏธ์ง€ ๋ถ„ํ• ์„ ์œ„ํ•œ ๋Œ€ํ‘œ์ ์ธ ๋ชจ๋ธ ์ค‘ ํ•˜๋‚˜๋กœ End-to-End ๋ฐฉ์‹์˜ FCN ๊ธฐ๋ฐ˜ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.
  • ํฌ๊ฒŒ Downsampling๊ณผ Upsampling ๋ถ€๋ถ„์œผ๋กœ ๋‚˜๋‰˜์–ด์ง‘๋‹ˆ๋‹ค.
  • Downsampling์€ Convolution ๋ฐ Pooling ๊ณผ์ •์„ ํ†ตํ•ด ์ด๋ฏธ์ง€์˜ ํŠน์ง•์„ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค.
  • Upsampling์˜ ๊ฒฝ์šฐ Convolution ๋ฐ Transpose Convolution ๊ณผ์ •์„ ํ†ตํ•ด ์›๋ณธ๊ณผ ๋น„์Šทํ•œ ํฌ๊ธฐ๋กœ ๋ณต์›ํ•ฉ๋‹ˆ๋‹ค.
  • ๋˜ํ•œ, Downsampling ๊ฐ level์—์„œ์˜ output์ธ feature map์„ ์ ๋‹นํ•œ ํฌ๊ธฐ๋กœ ๋งŒ๋“ค์–ด ๊ฐ™์€ level์—์„œ์˜ Upsampling input์— concatํ•˜์—ฌ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. (skip-connetion๊ณผ ๊ฐ™์ด ์†Œ์‹ค๋˜๋Š” ์ •๋ณด๋ฅผ ์–ด๋Š ์ •๋„ ๋ณด์กดํ•ด์ฃผ๋Š” ์—ญํ• ์„ ํ•ฉ๋‹ˆ๋‹ค.)

image

Object Detection/Recognition

  • ์ „์ฒด ์ด๋ฏธ์ง€์—์„œ ๋ ˆ์ด๋ธ”์— ๋งž๋Š” ๊ฐ์ฒด๋ฅผ ์ฐพ์•„๋‚ด๋Š” Task
  • ๊ฐ์ฒด์˜ ๊ฒฝ๊ณ„์— Bounding Box๋ผ๋Š” ์‚ฌ๊ฐํ˜• ๋ฐ•์Šค๋ฅผ ๋ถ€์—ฌ ํ›„, ๋ฐ•์Šค ๋‚ด์˜ ๊ฐ์ฒด๊ฐ€ ์†ํ•˜๋Š” ํด๋ž˜์Šค๋ฅผ ๋ถ„๋ฅ˜ํ•ฉ๋‹ˆ๋‹ค.

image

[Object Detection] IoU(Intersection over Union)

  • ๊ฐ์ฒด ํƒ์ง€์˜ ๊ฒฐ๊ณผ๋ฅผ ํ‰๊ฐ€ํ•˜๋Š” ์ง€ํ‘œ
  • $IoU = \frac{์˜ˆ์ธก ์˜์—ญ \cap ์‹ค์ œ ์˜์—ญ}{์˜ˆ์ธก ์˜์—ญ \cup ์‹ค์ œ ์˜์—ญ}$
  • ์ฆ‰, IoU๊ฐ€ 1์— ๊ฐ€๊นŒ์šธ์ˆ˜๋ก ์„ฑ๋Šฅ์ด ์ข‹์Œ์„ ๋œปํ•ฉ๋‹ˆ๋‹ค.

image

[Object Detection] Model mechanism

  • Two Stage Detector : ๋จผ์ € ๊ฐ์ฒด๊ฐ€ ์žˆ์„ ๋งŒํ•œ ์œ„์น˜๋ฅผ ์ถ”์ฒœ ๋ฐ›์€ ํ›„(Region Proposal), ์ถ”์ฒœ ๋ฐ›์€ ์ง€์—ญ(Region of Interest ; RoI)์— ๋Œ€ํ•ด ๋ถ„๋ฅ˜๋ฅผ ์ง„ํ–‰ํ•˜๋Š” ๋ฐฉ์‹, ์ƒ๋Œ€์ ์œผ๋กœ ์‹œ๊ฐ„์ด ์˜ค๋ž˜ ๊ฑธ๋ฆฌ๊ณ  ๋น„๊ต์  ์„ฑ๋Šฅ์ด ์ข‹์Šต๋‹ˆ๋‹ค.
  • One Stage Detector : ํŠน์ • ์ง€์—ญ์„ ์ถ”์ฒœ๋ฐ›๋Š” ๊ฒŒ ์•„๋‹Œ, ์ž…๋ ฅ ์ด๋ฏธ์ง€๋ฅผ grid์™€ ๊ฐ™์€ ์ž‘์€ ๊ณต๊ฐ„์œผ๋กœ ๋‚˜๋ˆˆ ๋’ค ํ•ด๋‹น ๊ณต๊ฐ„์„ ํƒ์ƒ‰ํ•˜๋ฉฐ ๋ถ„๋ฅ˜๋ฅผ ์ง„ํ–‰ํ•˜๋Š” ๋ฐฉ์‹, ์ƒ๋Œ€์ ์œผ๋กœ ์‹œ๊ฐ„์ด ์งง๊ฒŒ ๊ฑธ๋ฆฌ๊ณ  ๋น„๊ต์  ์„ฑ๋Šฅ์ด ๋–จ์–ด์ง‘๋‹ˆ๋‹ค.

image