😁 빅데이터 문제 풀기 & Study/- 클론코딩하기

[pandas] 🍒모두를 위한 데이터사이언스 클론코딩하기-1

또방91 2022. 1. 25. 23:15

728x90

예전에 공부했던 판다스 코드를 클론코딩해보기!

마치 예전부터 알았던 것 마냥 얼른 복기해보자😉

In [1]:

from IPython.core.display import display, HTML 
display(HTML("<style>.container { width:100% !important; }</style>"))
#티스토리 업로드 원활하게:-)

🍒모두를 위한 데이터사이언스 클론코딩하기-1🍒¶

Pandas 공부하기

라이브러리 로드¶

In [2]:

import pandas as pd
import seaborn as sns

In [3]:

pd.__version__

Out[3]:

'1.3.4'

In [4]:

sns.__version__

Out[4]:

'0.11.2'

데이터셋 불러오기¶

In [5]:

#자동차 연비 데이터셋 불러오기
df = sns.load_dataset("mpg")

In [6]:

df

Out[6]:

	mpg	cylinders	displacement	horsepower	weight	acceleration	model_year	origin	name
0	18.0	8	307.0	130.0	3504	12.0	70	usa	chevrolet chevelle malibu
1	15.0	8	350.0	165.0	3693	11.5	70	usa	buick skylark 320
2	18.0	8	318.0	150.0	3436	11.0	70	usa	plymouth satellite
3	16.0	8	304.0	150.0	3433	12.0	70	usa	amc rebel sst
4	17.0	8	302.0	140.0	3449	10.5	70	usa	ford torino
...	...	...	...	...	...	...	...	...	...
393	27.0	4	140.0	86.0	2790	15.6	82	usa	ford mustang gl
394	44.0	4	97.0	52.0	2130	24.6	82	europe	vw pickup
395	32.0	4	135.0	84.0	2295	11.6	82	usa	dodge rampage
396	28.0	4	120.0	79.0	2625	18.6	82	usa	ford ranger
397	31.0	4	119.0	82.0	2720	19.4	82	usa	chevy s-10

398 rows × 9 columns

In [7]:

# 398 행 9열
df.shape

Out[7]:

(398, 9)

In [8]:

df.index

Out[8]:

RangeIndex(start=0, stop=398, step=1)

In [9]:

df.columns

Out[9]:

Index(['mpg', 'cylinders', 'displacement', 'horsepower', 'weight',
       'acceleration', 'model_year', 'origin', 'name'],
      dtype='object')

In [10]:

df.values

Out[10]:

array([[18.0, 8, 307.0, ..., 70, 'usa', 'chevrolet chevelle malibu'],
       [15.0, 8, 350.0, ..., 70, 'usa', 'buick skylark 320'],
       [18.0, 8, 318.0, ..., 70, 'usa', 'plymouth satellite'],
       ...,
       [32.0, 4, 135.0, ..., 82, 'usa', 'dodge rampage'],
       [28.0, 4, 120.0, ..., 82, 'usa', 'ford ranger'],
       [31.0, 4, 119.0, ..., 82, 'usa', 'chevy s-10']], dtype=object)

In [11]:

#df의 데이터 타입
df.dtypes

Out[11]:

mpg             float64
cylinders         int64
displacement    float64
horsepower      float64
weight            int64
acceleration    float64
model_year        int64
origin           object
name             object
dtype: object

데이터셋 일부만 가져오기¶

In [12]:

# 위 5개 = 기본값
df.head()

Out[12]:

	mpg	cylinders	displacement	horsepower	weight	acceleration	model_year	origin	name
0	18.0	8	307.0	130.0	3504	12.0	70	usa	chevrolet chevelle malibu
1	15.0	8	350.0	165.0	3693	11.5	70	usa	buick skylark 320
2	18.0	8	318.0	150.0	3436	11.0	70	usa	plymouth satellite
3	16.0	8	304.0	150.0	3433	12.0	70	usa	amc rebel sst
4	17.0	8	302.0	140.0	3449	10.5	70	usa	ford torino

In [13]:

# 위 3개
df.head(3)

Out[13]:

	mpg	cylinders	displacement	horsepower	weight	acceleration	model_year	origin	name
0	18.0	8	307.0	130.0	3504	12.0	70	usa	chevrolet chevelle malibu
1	15.0	8	350.0	165.0	3693	11.5	70	usa	buick skylark 320
2	18.0	8	318.0	150.0	3436	11.0	70	usa	plymouth satellite

In [14]:

#아래 5개 = 기본값
df.tail()

Out[14]:

	mpg	cylinders	displacement	horsepower	weight	acceleration	model_year	origin	name
393	27.0	4	140.0	86.0	2790	15.6	82	usa	ford mustang gl
394	44.0	4	97.0	52.0	2130	24.6	82	europe	vw pickup
395	32.0	4	135.0	84.0	2295	11.6	82	usa	dodge rampage
396	28.0	4	120.0	79.0	2625	18.6	82	usa	ford ranger
397	31.0	4	119.0	82.0	2720	19.4	82	usa	chevy s-10

In [15]:

#아래 3개
df.tail(3)

Out[15]:

	mpg	cylinders	displacement	horsepower	weight	acceleration	model_year	origin	name
395	32.0	4	135.0	84.0	2295	11.6	82	usa	dodge rampage
396	28.0	4	120.0	79.0	2625	18.6	82	usa	ford ranger
397	31.0	4	119.0	82.0	2720	19.4	82	usa	chevy s-10

In [16]:

#랜덤으로 1개 가져오기 = 기본값
df.sample()

Out[16]:

	mpg	cylinders	displacement	horsepower	weight	acceleration	model_year	origin	name
60	20.0	4	140.0	90.0	2408	19.5	72	usa	chevrolet vega

In [17]:

#랜덤으로 4개 가져오기 -> 실행할떄마다 달라지는 값
df.sample(4)

Out[17]:

	mpg	cylinders	displacement	horsepower	weight	acceleration	model_year	origin	name
392	27.0	4	151.0	90.0	2950	17.3	82	usa	chevrolet camaro
278	31.5	4	89.0	71.0	1990	14.9	78	europe	volkswagen scirocco
185	26.0	4	98.0	79.0	2255	17.7	76	usa	dodge colt
38	14.0	8	350.0	165.0	4209	12.0	71	usa	chevrolet impala

In [18]:

#비교
#랜덤으로 4개 가져오기 -> 실행할떄마다 같은 값
df.sample(4,random_state=42)

Out[18]:

	mpg	cylinders	displacement	horsepower	weight	acceleration	model_year	origin	name
198	33.0	4	91.0	53.0	1795	17.4	76	japan	honda civic
396	28.0	4	120.0	79.0	2625	18.6	82	usa	ford ranger
33	19.0	6	232.0	100.0	2634	13.0	71	usa	amc gremlin
208	13.0	8	318.0	150.0	3940	13.2	76	usa	plymouth volare premier v8

요약하기¶

In [19]:

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 398 entries, 0 to 397
Data columns (total 9 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   mpg           398 non-null    float64
 1   cylinders     398 non-null    int64  
 2   displacement  398 non-null    float64
 3   horsepower    392 non-null    float64
 4   weight        398 non-null    int64  
 5   acceleration  398 non-null    float64
 6   model_year    398 non-null    int64  
 7   origin        398 non-null    object 
 8   name          398 non-null    object 
dtypes: float64(4), int64(3), object(2)
memory usage: 28.1+ KB

결측치 확인¶

In [20]:

True + False

Out[20]:

In [21]:

True + True

Out[21]:

In [22]:

# isna()
df.isna()

Out[22]:

	mpg	cylinders	displacement	horsepower	weight	acceleration	model_year	origin	name
0	False	False	False	False	False	False	False	False	False
1	False	False	False	False	False	False	False	False	False
2	False	False	False	False	False	False	False	False	False
3	False	False	False	False	False	False	False	False	False
4	False	False	False	False	False	False	False	False	False
...	...	...	...	...	...	...	...	...	...
393	False	False	False	False	False	False	False	False	False
394	False	False	False	False	False	False	False	False	False
395	False	False	False	False	False	False	False	False	False
396	False	False	False	False	False	False	False	False	False
397	False	False	False	False	False	False	False	False	False

398 rows × 9 columns

In [23]:

#isnull()
df.isnull()

Out[23]:

	mpg	cylinders	displacement	horsepower	weight	acceleration	model_year	origin	name
0	False	False	False	False	False	False	False	False	False
1	False	False	False	False	False	False	False	False	False
2	False	False	False	False	False	False	False	False	False
3	False	False	False	False	False	False	False	False	False
4	False	False	False	False	False	False	False	False	False
...	...	...	...	...	...	...	...	...	...
393	False	False	False	False	False	False	False	False	False
394	False	False	False	False	False	False	False	False	False
395	False	False	False	False	False	False	False	False	False
396	False	False	False	False	False	False	False	False	False
397	False	False	False	False	False	False	False	False	False

398 rows × 9 columns

In [24]:

#결측치수 확인하기
df.isnull().sum()

Out[24]:

mpg             0
cylinders       0
displacement    0
horsepower      6
weight          0
acceleration    0
model_year      0
origin          0
name            0
dtype: int64

In [25]:

#전체 값대비 결측치개수인 결측치 비율 확인
df.isnull().mean()

Out[25]:

mpg             0.000000
cylinders       0.000000
displacement    0.000000
horsepower      0.015075
weight          0.000000
acceleration    0.000000
model_year      0.000000
origin          0.000000
name            0.000000
dtype: float64

In [26]:

df.isnull().mean()*100

Out[26]:

mpg             0.000000
cylinders       0.000000
displacement    0.000000
horsepower      1.507538
weight          0.000000
acceleration    0.000000
model_year      0.000000
origin          0.000000
name            0.000000
dtype: float64

기술통계¶

In [27]:

#데이터타입이 수치형인 경우 ---- [18,34]데이터타입 확인하기
df.describe()

Out[27]:

	mpg	cylinders	displacement	horsepower	weight	acceleration	model_year
count	398.000000	398.000000	398.000000	392.000000	398.000000	398.000000	398.000000
mean	23.514573	5.454774	193.425879	104.469388	2970.424623	15.568090	76.010050
std	7.815984	1.701004	104.269838	38.491160	846.841774	2.757689	3.697627
min	9.000000	3.000000	68.000000	46.000000	1613.000000	8.000000	70.000000
25%	17.500000	4.000000	104.250000	75.000000	2223.750000	13.825000	73.000000
50%	23.000000	4.000000	148.500000	93.500000	2803.500000	15.500000	76.000000
75%	29.000000	8.000000	262.000000	126.000000	3608.000000	17.175000	79.000000
max	46.600000	8.000000	455.000000	230.000000	5140.000000	24.800000	82.000000

In [28]:

#데이터타입이 문자열인 경우 include="object" 써주기 ---- [18,34]데이터타입 확인하기
df.describe(include="object")

Out[28]:

	origin	name
count	398	398
unique	3	305
top	usa	ford pinto
freq	249	6

시리즈 Series¶

In [29]:

df["mpg"]

Out[29]:

0      18.0
1      15.0
2      18.0
3      16.0
4      17.0
       ... 
393    27.0
394    44.0
395    32.0
396    28.0
397    31.0
Name: mpg, Length: 398, dtype: float64

In [30]:

# 1차원 벡터형태
type(df["mpg"])

Out[30]:

pandas.core.series.Series

데이터프레임 DataFrame¶

In [31]:

df

Out[31]:

	mpg	cylinders	displacement	horsepower	weight	acceleration	model_year	origin	name
0	18.0	8	307.0	130.0	3504	12.0	70	usa	chevrolet chevelle malibu
1	15.0	8	350.0	165.0	3693	11.5	70	usa	buick skylark 320
2	18.0	8	318.0	150.0	3436	11.0	70	usa	plymouth satellite
3	16.0	8	304.0	150.0	3433	12.0	70	usa	amc rebel sst
4	17.0	8	302.0	140.0	3449	10.5	70	usa	ford torino
...	...	...	...	...	...	...	...	...	...
393	27.0	4	140.0	86.0	2790	15.6	82	usa	ford mustang gl
394	44.0	4	97.0	52.0	2130	24.6	82	europe	vw pickup
395	32.0	4	135.0	84.0	2295	11.6	82	usa	dodge rampage
396	28.0	4	120.0	79.0	2625	18.6	82	usa	ford ranger
397	31.0	4	119.0	82.0	2720	19.4	82	usa	chevy s-10

398 rows × 9 columns

In [32]:

type(df)

Out[32]:

pandas.core.frame.DataFrame

색인하기¶

column 인덱싱¶

In [33]:

#시리즈형태
df["name"]

Out[33]:

0      chevrolet chevelle malibu
1              buick skylark 320
2             plymouth satellite
3                  amc rebel sst
4                    ford torino
                 ...            
393              ford mustang gl
394                    vw pickup
395                dodge rampage
396                  ford ranger
397                   chevy s-10
Name: name, Length: 398, dtype: object

In [34]:

#2차원 데이터프레임형태
df[["name"]]

Out[34]:

	name
0	chevrolet chevelle malibu
1	buick skylark 320
2	plymouth satellite
3	amc rebel sst
4	ford torino
...	...
393	ford mustang gl
394	vw pickup
395	dodge rampage
396	ford ranger
397	chevy s-10

398 rows × 1 columns

In [35]:

df[["origin","name"]]

Out[35]:

	origin	name
0	usa	chevrolet chevelle malibu
1	usa	buick skylark 320
2	usa	plymouth satellite
3	usa	amc rebel sst
4	usa	ford torino
...	...	...
393	usa	ford mustang gl
394	europe	vw pickup
395	usa	dodge rampage
396	usa	ford ranger
397	usa	chevy s-10

398 rows × 2 columns

행 인덱싱¶

.loc[행]
.loc[행, 열]
.loc[조건식, 열]

In [36]:

df.loc[0]

Out[36]:

mpg                                  18.0
cylinders                               8
displacement                        307.0
horsepower                          130.0
weight                               3504
acceleration                         12.0
model_year                             70
origin                                usa
name            chevrolet chevelle malibu
Name: 0, dtype: object

In [37]:

df.loc[[0]]

Out[37]:

	mpg	cylinders	displacement	horsepower	weight	acceleration	model_year	origin	name
0	18.0	8	307.0	130.0	3504	12.0	70	usa	chevrolet chevelle malibu

In [38]:

# 여러행 데이터프레임
df.loc[[0,1]]

Out[38]:

	mpg	cylinders	displacement	horsepower	weight	acceleration	model_year	origin	name
0	18.0	8	307.0	130.0	3504	12.0	70	usa	chevrolet chevelle malibu
1	15.0	8	350.0	165.0	3693	11.5	70	usa	buick skylark 320

In [39]:

# 행과 열
df.loc[0,"name"]

Out[39]:

'chevrolet chevelle malibu'

In [40]:

#여러행과 열
df.loc[[0,1],["name"]]

Out[40]:

	name
0	chevrolet chevelle malibu
1	buick skylark 320

728x90

'😁 빅데이터 문제 풀기 & Study > - 클론코딩하기' 카테고리의 다른 글

[pandas] 🍒모두를 위한 데이터사이언스 클론코딩하기-3.수치형변수 (0)	2022.01.27
[pandas] 🍒모두를 위한 데이터사이언스 클론코딩하기-2.기술통계 (0)	2022.01.27

현재글[pandas] 🍒모두를 위한 데이터사이언스 클론코딩하기-1

코딩하는 간호사

[pandas] 🍒모두를 위한 데이터사이언스 클론코딩하기-1

🍒모두를 위한 데이터사이언스 클론코딩하기-1🍒¶

라이브러리 로드¶

데이터셋 불러오기¶

데이터셋 일부만 가져오기¶

요약하기¶

결측치 확인¶

기술통계¶

시리즈 Series¶

데이터프레임 DataFrame¶

색인하기¶

column 인덱싱¶

행 인덱싱¶

'😁 빅데이터 문제 풀기 & Study > - 클론코딩하기' 카테고리의 다른 글

'😁 빅데이터 문제 풀기 & Study/- 클론코딩하기'의 다른글

티스토리툴바

[pandas] 🍒모두를 위한 데이터사이언스 클론코딩하기-1

🍒모두를 위한 데이터사이언스 클론코딩하기-1🍒¶

라이브러리 로드¶

데이터셋 불러오기¶

데이터셋 일부만 가져오기¶

요약하기¶

결측치 확인¶

기술통계¶

시리즈 Series¶

데이터프레임 DataFrame¶

색인하기¶

column 인덱싱¶

행 인덱싱¶

'😁 빅데이터 문제 풀기 & Study > - 클론코딩하기' 카테고리의 다른 글

'😁 빅데이터 문제 풀기 & Study/- 클론코딩하기'의 다른글

관련글

티스토리툴바