728x90
예전에 공부했던 판다스 코드를 클론코딩해보기!
마치 예전부터 알았던 것 마냥 얼른 복기해보자😉
In [1]:
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))
#티스토리 업로드 원활하게:-)
🍒모두를 위한 데이터사이언스 클론코딩하기-1🍒¶
Pandas 공부하기
라이브러리 로드¶
In [2]:
import pandas as pd
import seaborn as sns
In [3]:
pd.__version__
Out[3]:
'1.3.4'
In [4]:
sns.__version__
Out[4]:
'0.11.2'
데이터셋 불러오기¶
In [5]:
#자동차 연비 데이터셋 불러오기
df = sns.load_dataset("mpg")
In [6]:
df
Out[6]:
mpg | cylinders | displacement | horsepower | weight | acceleration | model_year | origin | name | |
---|---|---|---|---|---|---|---|---|---|
0 | 18.0 | 8 | 307.0 | 130.0 | 3504 | 12.0 | 70 | usa | chevrolet chevelle malibu |
1 | 15.0 | 8 | 350.0 | 165.0 | 3693 | 11.5 | 70 | usa | buick skylark 320 |
2 | 18.0 | 8 | 318.0 | 150.0 | 3436 | 11.0 | 70 | usa | plymouth satellite |
3 | 16.0 | 8 | 304.0 | 150.0 | 3433 | 12.0 | 70 | usa | amc rebel sst |
4 | 17.0 | 8 | 302.0 | 140.0 | 3449 | 10.5 | 70 | usa | ford torino |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
393 | 27.0 | 4 | 140.0 | 86.0 | 2790 | 15.6 | 82 | usa | ford mustang gl |
394 | 44.0 | 4 | 97.0 | 52.0 | 2130 | 24.6 | 82 | europe | vw pickup |
395 | 32.0 | 4 | 135.0 | 84.0 | 2295 | 11.6 | 82 | usa | dodge rampage |
396 | 28.0 | 4 | 120.0 | 79.0 | 2625 | 18.6 | 82 | usa | ford ranger |
397 | 31.0 | 4 | 119.0 | 82.0 | 2720 | 19.4 | 82 | usa | chevy s-10 |
398 rows × 9 columns
In [7]:
# 398 행 9열
df.shape
Out[7]:
(398, 9)
In [8]:
df.index
Out[8]:
RangeIndex(start=0, stop=398, step=1)
In [9]:
df.columns
Out[9]:
Index(['mpg', 'cylinders', 'displacement', 'horsepower', 'weight',
'acceleration', 'model_year', 'origin', 'name'],
dtype='object')
In [10]:
df.values
Out[10]:
array([[18.0, 8, 307.0, ..., 70, 'usa', 'chevrolet chevelle malibu'],
[15.0, 8, 350.0, ..., 70, 'usa', 'buick skylark 320'],
[18.0, 8, 318.0, ..., 70, 'usa', 'plymouth satellite'],
...,
[32.0, 4, 135.0, ..., 82, 'usa', 'dodge rampage'],
[28.0, 4, 120.0, ..., 82, 'usa', 'ford ranger'],
[31.0, 4, 119.0, ..., 82, 'usa', 'chevy s-10']], dtype=object)
In [11]:
#df의 데이터 타입
df.dtypes
Out[11]:
mpg float64
cylinders int64
displacement float64
horsepower float64
weight int64
acceleration float64
model_year int64
origin object
name object
dtype: object
데이터셋 일부만 가져오기¶
In [12]:
# 위 5개 = 기본값
df.head()
Out[12]:
mpg | cylinders | displacement | horsepower | weight | acceleration | model_year | origin | name | |
---|---|---|---|---|---|---|---|---|---|
0 | 18.0 | 8 | 307.0 | 130.0 | 3504 | 12.0 | 70 | usa | chevrolet chevelle malibu |
1 | 15.0 | 8 | 350.0 | 165.0 | 3693 | 11.5 | 70 | usa | buick skylark 320 |
2 | 18.0 | 8 | 318.0 | 150.0 | 3436 | 11.0 | 70 | usa | plymouth satellite |
3 | 16.0 | 8 | 304.0 | 150.0 | 3433 | 12.0 | 70 | usa | amc rebel sst |
4 | 17.0 | 8 | 302.0 | 140.0 | 3449 | 10.5 | 70 | usa | ford torino |
In [13]:
# 위 3개
df.head(3)
Out[13]:
mpg | cylinders | displacement | horsepower | weight | acceleration | model_year | origin | name | |
---|---|---|---|---|---|---|---|---|---|
0 | 18.0 | 8 | 307.0 | 130.0 | 3504 | 12.0 | 70 | usa | chevrolet chevelle malibu |
1 | 15.0 | 8 | 350.0 | 165.0 | 3693 | 11.5 | 70 | usa | buick skylark 320 |
2 | 18.0 | 8 | 318.0 | 150.0 | 3436 | 11.0 | 70 | usa | plymouth satellite |
In [14]:
#아래 5개 = 기본값
df.tail()
Out[14]:
mpg | cylinders | displacement | horsepower | weight | acceleration | model_year | origin | name | |
---|---|---|---|---|---|---|---|---|---|
393 | 27.0 | 4 | 140.0 | 86.0 | 2790 | 15.6 | 82 | usa | ford mustang gl |
394 | 44.0 | 4 | 97.0 | 52.0 | 2130 | 24.6 | 82 | europe | vw pickup |
395 | 32.0 | 4 | 135.0 | 84.0 | 2295 | 11.6 | 82 | usa | dodge rampage |
396 | 28.0 | 4 | 120.0 | 79.0 | 2625 | 18.6 | 82 | usa | ford ranger |
397 | 31.0 | 4 | 119.0 | 82.0 | 2720 | 19.4 | 82 | usa | chevy s-10 |
In [15]:
#아래 3개
df.tail(3)
Out[15]:
mpg | cylinders | displacement | horsepower | weight | acceleration | model_year | origin | name | |
---|---|---|---|---|---|---|---|---|---|
395 | 32.0 | 4 | 135.0 | 84.0 | 2295 | 11.6 | 82 | usa | dodge rampage |
396 | 28.0 | 4 | 120.0 | 79.0 | 2625 | 18.6 | 82 | usa | ford ranger |
397 | 31.0 | 4 | 119.0 | 82.0 | 2720 | 19.4 | 82 | usa | chevy s-10 |
In [16]:
#랜덤으로 1개 가져오기 = 기본값
df.sample()
Out[16]:
mpg | cylinders | displacement | horsepower | weight | acceleration | model_year | origin | name | |
---|---|---|---|---|---|---|---|---|---|
60 | 20.0 | 4 | 140.0 | 90.0 | 2408 | 19.5 | 72 | usa | chevrolet vega |
In [17]:
#랜덤으로 4개 가져오기 -> 실행할떄마다 달라지는 값
df.sample(4)
Out[17]:
mpg | cylinders | displacement | horsepower | weight | acceleration | model_year | origin | name | |
---|---|---|---|---|---|---|---|---|---|
392 | 27.0 | 4 | 151.0 | 90.0 | 2950 | 17.3 | 82 | usa | chevrolet camaro |
278 | 31.5 | 4 | 89.0 | 71.0 | 1990 | 14.9 | 78 | europe | volkswagen scirocco |
185 | 26.0 | 4 | 98.0 | 79.0 | 2255 | 17.7 | 76 | usa | dodge colt |
38 | 14.0 | 8 | 350.0 | 165.0 | 4209 | 12.0 | 71 | usa | chevrolet impala |
In [18]:
#비교
#랜덤으로 4개 가져오기 -> 실행할떄마다 같은 값
df.sample(4,random_state=42)
Out[18]:
mpg | cylinders | displacement | horsepower | weight | acceleration | model_year | origin | name | |
---|---|---|---|---|---|---|---|---|---|
198 | 33.0 | 4 | 91.0 | 53.0 | 1795 | 17.4 | 76 | japan | honda civic |
396 | 28.0 | 4 | 120.0 | 79.0 | 2625 | 18.6 | 82 | usa | ford ranger |
33 | 19.0 | 6 | 232.0 | 100.0 | 2634 | 13.0 | 71 | usa | amc gremlin |
208 | 13.0 | 8 | 318.0 | 150.0 | 3940 | 13.2 | 76 | usa | plymouth volare premier v8 |
요약하기¶
In [19]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 398 entries, 0 to 397
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 mpg 398 non-null float64
1 cylinders 398 non-null int64
2 displacement 398 non-null float64
3 horsepower 392 non-null float64
4 weight 398 non-null int64
5 acceleration 398 non-null float64
6 model_year 398 non-null int64
7 origin 398 non-null object
8 name 398 non-null object
dtypes: float64(4), int64(3), object(2)
memory usage: 28.1+ KB
결측치 확인¶
In [20]:
True + False
Out[20]:
1
In [21]:
True + True
Out[21]:
2
In [22]:
# isna()
df.isna()
Out[22]:
mpg | cylinders | displacement | horsepower | weight | acceleration | model_year | origin | name | |
---|---|---|---|---|---|---|---|---|---|
0 | False | False | False | False | False | False | False | False | False |
1 | False | False | False | False | False | False | False | False | False |
2 | False | False | False | False | False | False | False | False | False |
3 | False | False | False | False | False | False | False | False | False |
4 | False | False | False | False | False | False | False | False | False |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
393 | False | False | False | False | False | False | False | False | False |
394 | False | False | False | False | False | False | False | False | False |
395 | False | False | False | False | False | False | False | False | False |
396 | False | False | False | False | False | False | False | False | False |
397 | False | False | False | False | False | False | False | False | False |
398 rows × 9 columns
In [23]:
#isnull()
df.isnull()
Out[23]:
mpg | cylinders | displacement | horsepower | weight | acceleration | model_year | origin | name | |
---|---|---|---|---|---|---|---|---|---|
0 | False | False | False | False | False | False | False | False | False |
1 | False | False | False | False | False | False | False | False | False |
2 | False | False | False | False | False | False | False | False | False |
3 | False | False | False | False | False | False | False | False | False |
4 | False | False | False | False | False | False | False | False | False |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
393 | False | False | False | False | False | False | False | False | False |
394 | False | False | False | False | False | False | False | False | False |
395 | False | False | False | False | False | False | False | False | False |
396 | False | False | False | False | False | False | False | False | False |
397 | False | False | False | False | False | False | False | False | False |
398 rows × 9 columns
In [24]:
#결측치수 확인하기
df.isnull().sum()
Out[24]:
mpg 0
cylinders 0
displacement 0
horsepower 6
weight 0
acceleration 0
model_year 0
origin 0
name 0
dtype: int64
In [25]:
#전체 값대비 결측치개수인 결측치 비율 확인
df.isnull().mean()
Out[25]:
mpg 0.000000
cylinders 0.000000
displacement 0.000000
horsepower 0.015075
weight 0.000000
acceleration 0.000000
model_year 0.000000
origin 0.000000
name 0.000000
dtype: float64
In [26]:
df.isnull().mean()*100
Out[26]:
mpg 0.000000
cylinders 0.000000
displacement 0.000000
horsepower 1.507538
weight 0.000000
acceleration 0.000000
model_year 0.000000
origin 0.000000
name 0.000000
dtype: float64
기술통계¶
In [27]:
#데이터타입이 수치형인 경우 ---- [18,34]데이터타입 확인하기
df.describe()
Out[27]:
mpg | cylinders | displacement | horsepower | weight | acceleration | model_year | |
---|---|---|---|---|---|---|---|
count | 398.000000 | 398.000000 | 398.000000 | 392.000000 | 398.000000 | 398.000000 | 398.000000 |
mean | 23.514573 | 5.454774 | 193.425879 | 104.469388 | 2970.424623 | 15.568090 | 76.010050 |
std | 7.815984 | 1.701004 | 104.269838 | 38.491160 | 846.841774 | 2.757689 | 3.697627 |
min | 9.000000 | 3.000000 | 68.000000 | 46.000000 | 1613.000000 | 8.000000 | 70.000000 |
25% | 17.500000 | 4.000000 | 104.250000 | 75.000000 | 2223.750000 | 13.825000 | 73.000000 |
50% | 23.000000 | 4.000000 | 148.500000 | 93.500000 | 2803.500000 | 15.500000 | 76.000000 |
75% | 29.000000 | 8.000000 | 262.000000 | 126.000000 | 3608.000000 | 17.175000 | 79.000000 |
max | 46.600000 | 8.000000 | 455.000000 | 230.000000 | 5140.000000 | 24.800000 | 82.000000 |
In [28]:
#데이터타입이 문자열인 경우 include="object" 써주기 ---- [18,34]데이터타입 확인하기
df.describe(include="object")
Out[28]:
origin | name | |
---|---|---|
count | 398 | 398 |
unique | 3 | 305 |
top | usa | ford pinto |
freq | 249 | 6 |
시리즈 Series¶
In [29]:
df["mpg"]
Out[29]:
0 18.0
1 15.0
2 18.0
3 16.0
4 17.0
...
393 27.0
394 44.0
395 32.0
396 28.0
397 31.0
Name: mpg, Length: 398, dtype: float64
In [30]:
# 1차원 벡터형태
type(df["mpg"])
Out[30]:
pandas.core.series.Series
데이터프레임 DataFrame¶
In [31]:
df
Out[31]:
mpg | cylinders | displacement | horsepower | weight | acceleration | model_year | origin | name | |
---|---|---|---|---|---|---|---|---|---|
0 | 18.0 | 8 | 307.0 | 130.0 | 3504 | 12.0 | 70 | usa | chevrolet chevelle malibu |
1 | 15.0 | 8 | 350.0 | 165.0 | 3693 | 11.5 | 70 | usa | buick skylark 320 |
2 | 18.0 | 8 | 318.0 | 150.0 | 3436 | 11.0 | 70 | usa | plymouth satellite |
3 | 16.0 | 8 | 304.0 | 150.0 | 3433 | 12.0 | 70 | usa | amc rebel sst |
4 | 17.0 | 8 | 302.0 | 140.0 | 3449 | 10.5 | 70 | usa | ford torino |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
393 | 27.0 | 4 | 140.0 | 86.0 | 2790 | 15.6 | 82 | usa | ford mustang gl |
394 | 44.0 | 4 | 97.0 | 52.0 | 2130 | 24.6 | 82 | europe | vw pickup |
395 | 32.0 | 4 | 135.0 | 84.0 | 2295 | 11.6 | 82 | usa | dodge rampage |
396 | 28.0 | 4 | 120.0 | 79.0 | 2625 | 18.6 | 82 | usa | ford ranger |
397 | 31.0 | 4 | 119.0 | 82.0 | 2720 | 19.4 | 82 | usa | chevy s-10 |
398 rows × 9 columns
In [32]:
type(df)
Out[32]:
pandas.core.frame.DataFrame
색인하기¶
column 인덱싱¶
In [33]:
#시리즈형태
df["name"]
Out[33]:
0 chevrolet chevelle malibu
1 buick skylark 320
2 plymouth satellite
3 amc rebel sst
4 ford torino
...
393 ford mustang gl
394 vw pickup
395 dodge rampage
396 ford ranger
397 chevy s-10
Name: name, Length: 398, dtype: object
In [34]:
#2차원 데이터프레임형태
df[["name"]]
Out[34]:
name | |
---|---|
0 | chevrolet chevelle malibu |
1 | buick skylark 320 |
2 | plymouth satellite |
3 | amc rebel sst |
4 | ford torino |
... | ... |
393 | ford mustang gl |
394 | vw pickup |
395 | dodge rampage |
396 | ford ranger |
397 | chevy s-10 |
398 rows × 1 columns
In [35]:
df[["origin","name"]]
Out[35]:
origin | name | |
---|---|---|
0 | usa | chevrolet chevelle malibu |
1 | usa | buick skylark 320 |
2 | usa | plymouth satellite |
3 | usa | amc rebel sst |
4 | usa | ford torino |
... | ... | ... |
393 | usa | ford mustang gl |
394 | europe | vw pickup |
395 | usa | dodge rampage |
396 | usa | ford ranger |
397 | usa | chevy s-10 |
398 rows × 2 columns
행 인덱싱¶
- .loc[행]
- .loc[행, 열]
- .loc[조건식, 열]
In [36]:
df.loc[0]
Out[36]:
mpg 18.0
cylinders 8
displacement 307.0
horsepower 130.0
weight 3504
acceleration 12.0
model_year 70
origin usa
name chevrolet chevelle malibu
Name: 0, dtype: object
In [37]:
df.loc[[0]]
Out[37]:
mpg | cylinders | displacement | horsepower | weight | acceleration | model_year | origin | name | |
---|---|---|---|---|---|---|---|---|---|
0 | 18.0 | 8 | 307.0 | 130.0 | 3504 | 12.0 | 70 | usa | chevrolet chevelle malibu |
In [38]:
# 여러행 데이터프레임
df.loc[[0,1]]
Out[38]:
mpg | cylinders | displacement | horsepower | weight | acceleration | model_year | origin | name | |
---|---|---|---|---|---|---|---|---|---|
0 | 18.0 | 8 | 307.0 | 130.0 | 3504 | 12.0 | 70 | usa | chevrolet chevelle malibu |
1 | 15.0 | 8 | 350.0 | 165.0 | 3693 | 11.5 | 70 | usa | buick skylark 320 |
In [39]:
# 행과 열
df.loc[0,"name"]
Out[39]:
'chevrolet chevelle malibu'
In [40]:
#여러행과 열
df.loc[[0,1],["name"]]
Out[40]:
name | |
---|---|
0 | chevrolet chevelle malibu |
1 | buick skylark 320 |
728x90
'😁 빅데이터 문제 풀기 & Study > - 클론코딩하기' 카테고리의 다른 글
[pandas] 🍒모두를 위한 데이터사이언스 클론코딩하기-3.수치형변수 (0) | 2022.01.27 |
---|---|
[pandas] 🍒모두를 위한 데이터사이언스 클론코딩하기-2.기술통계 (0) | 2022.01.27 |