๐Ÿ’ฟ Data/์ด๋ชจ์ €๋ชจ

๋ฐ์ดํ„ฐ ๋‹ค๋ฃจ๊ธฐ ์˜ˆ์‹œ1

Jayden1116 2021. 12. 7. 14:26
# Import Packages
import pandas as pd 
import numpy as np
import seaborn as sns

# dataset upload
df = sns.load_dataset("titanic")
df

1. ๊ฒฐ์ธก์น˜ ๋‹ค๋ฃจ๊ธฐ

Q. 'deck'์ปฌ๋Ÿผ์˜ ๊ฒฐ์ธก์น˜ ๊ฐœ์ˆ˜๋Š” ๋ช‡ ๊ฐœ์ธ๊ฐ€์š”?

df['deck'].isna().sum() # ํŠน์ • ์ปฌ๋Ÿผ์— ๊ฒฐ์ธก์น˜ ๊ฐœ์ˆ˜ ์„ธ๊ธฐ

Q. ๋ชจ๋“  ๊ฒฐ์ธก์น˜๋Š” ์ปฌ๋Ÿผ๊ธฐ์ค€ ์ง์ „์˜ ๊ฐ’์œผ๋กœ ๋Œ€์ฒดํ•˜๊ณ , ์ฒซ๋ฒˆ์งธ ํ–‰์— ๊ฒฐ์ธก์น˜๊ฐ€ ์žˆ์„ ๊ฒฝ์šฐ ๋’ค์— ์žˆ๋Š” ๊ฐ’์œผ๋กœ ๋Œ€์ฒดํ•˜์„ธ์š”

df['deck'].fillna(method='ffill', inplace=True) # ๋จผ์ € ์ „์ฒด์— ๋Œ€ํ•ด์„œ ์ง์ „๊ฐ’ ์ ์šฉ
df['deck'].fillna(method='bfill', inplace=True) # ์ฒซ ํ–‰์€ ์ ์šฉ์•ˆ๋˜์—ˆ์„ ํ…Œ๋‹ˆ, ํ›„์˜ ๊ฐ’์œผ๋กœ ์ ์šฉ

2. ๋ฐ์ดํ„ฐ์˜ ํ˜• ๋ณ€ํ™˜

Q. Data Type์„ ํ™•์ธํ•˜์„ธ์š”

df.dtypes

Q. 'fare' column์„ 'int64'ํ˜•ํƒœ๋กœ ๋ฐ”๊พธ์„ธ์š”

def toint(value):
  return int(value)

df['fare'].apply(toint)

3. ์ปฌ๋Ÿผ ์ถ”๊ฐ€ ๋ฐ ์‚ญ์ œ

Q. (์ปฌ๋Ÿผ์ถ”๊ฐ€) ๊ธฐ์กด Column์„ ์ด์šฉํ•ด์„œ ์ƒˆ๋กœ์šด Column์„ ๋งŒ๋“ค์–ด ๋ด…๋‹ˆ๋‹ค.
'fare' Column๊ณผ 'pclass'Column ์„ ์ด์šฉํ•ด์„œ fare_per_class๋ผ๋Š” Column์„ ๋งŒ๋“ค์–ด ๋ด…๋‹ˆ๋‹ค.

df['fare_per_class'] = df['fare'] / df['pclass']

Q. (์ปฌ๋Ÿผ์‚ญ์ œ) ๋งŒ๋“  Column์„ ์ง€์›Œ ๋ด…๋‹ˆ๋‹ค.
'fare_per_class' Column ์„ ์ด์šฉํ–ˆ์œผ๋‹ˆ ์ง€์›Œ๋ด…์‹œ๋‹ค.

df.drop(['fare_per_class'], axis=1, inplace=True)

4. loc ๋ฐ iloc ๋‹ค๋ฃจ๊ธฐ

Q. df์˜ ๋ฐ์ดํ„ฐ ์ค‘ embark_town ๊ฐ’์— Southampton ๋“ค์–ด๊ฐ€์ง€ ์•Š๋Š” ๊ฒฝ์šฐ์˜ ๊ฐฏ์ˆ˜๋Š” ๋ช‡ ๊ฐœ์ธ๊ฐ€์š”?

df[df['embark_town'] != 'Southampton'].count()

Q. 7๋ฒˆ์งธ ์ปฌ๋Ÿผ์˜ 3๋ฒˆ์งธ ๊ฐ’์€ ๋ฌด์—‡์ธ๊ฐ€?

df.iloc[2, 6]

5. ๋ฐ์ดํ„ฐ ํ•„ํ„ฐ๋ง

Q. age ๊ฐ’์ด 30๋ณด๋‹ค ์ž‘์€ ๋ฐ์ดํ„ฐ๋ฅผ ์ถ”์ถœํ•˜์—ฌ index๋ฅผ 0๋ถ€ํ„ฐ ์ •๋ ฌํ•˜๊ณ  ์ฒซ 5ํ–‰์„ ์ถœ๋ ฅํ•˜๋ผ

condition = (df['age'] < 30)
df1 = df[condition].reset_index(drop=True)

Q. pclass๊ฐ’์ด 2 ์ดํ•˜์ด๊ณ  alive ๊ฐ’์ด yes ์ธ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์„ ์ถ”์ถœํ•˜๋ผ

condition1 = (df['pclass'] <= 2)
condition2 = (df['alive'] == 'yes')
df[condition1 & condition2]