🌕

5일차 데이터 시각화2

notion imagenotion image

목차

 

Seaborn

  • Matplotlib에 API에 대한 불만족
  • Pandas DataFrame과의 호완성 부족(pandas가 개발된 것은 2008년, matplotlib은 2003년)
    • 이 부분은 많은 부분이 해결되었음
%matplotlib inline import matplotlib.pyplot as plt import numpy as np import pandas as pd x = np.linspace(0, 10, 20) y = x ** 2 plt.plot(x, y) plt.show()
Out[-]
notion imagenotion image
%matplotlib inline import matplotlib.pyplot as plt import seaborn as sns import numpy as np import pandas as pd x = np.linspace(0, 10, 20) y = x ** 2 sns.set() plt.plot(x, y) plt.show()
Out[-]
notion imagenotion image

relplot

  • 기본 설정은 산점도 그리기
    • hue : 종류에 따라서 색상을 다르게 분류
    • style : 종류에 따라서 마커를 다르게 분류
    • sizes : 크기를 다르게 해서 분류
    • col_wrap : n개의 컬럼으로 나눔
    • height : 각각의 그래프 높이
    • kind : 산점도 형식에서 다른 형식으로 변경
tips = sns.load_dataset("tips") # sns.load_dataset : seaborn에서 제공하는 데이터 불러오기 sns.relplot(x="total_bill", y="tip", data=tips) plt.show() sns.relplot(x="total_bill", y="tip", data=tips) plt.show()
Out[-]
notion imagenotion image
tips.head(10)
Out[-] total_bill tip sex smoker day time size 0 16.99 1.01 Female No Sun Dinner 2 1 10.34 1.66 Male No Sun Dinner 3 2 21.01 3.50 Male No Sun Dinner 3 3 23.68 3.31 Male No Sun Dinner 2 4 24.59 3.61 Female No Sun Dinner 4 5 25.29 4.71 Male No Sun Dinner 4 6 8.77 2.00 Male No Sun Dinner 2 7 26.88 3.12 Male No Sun Dinner 4 8 15.04 1.96 Male No Sun Dinner 2 9 14.78 3.23 Male No Sun Dinner 2
type(tips)
Out[-] pandas.core.frame.DataFrame
sns.relplot(x="total_bill", y="tip", hue="smoker", data=tips)
Out[-]
<seaborn.axisgrid.FacetGrid at 0x1b613781860>
notion imagenotion image
sns.relplot(x="total_bill", y="tip", hue="smoker", style='smoker', data=tips)
Out[-]
<seaborn.axisgrid.FacetGrid at 0x1b613833710>
notion imagenotion image
sns.relplot(x="total_bill", y="tip", hue="size", data=tips)
Out[-]
<seaborn.axisgrid.FacetGrid at 0x1b6115a1e10>
notion imagenotion image
sns.relplot(x="total_bill", y="tip", hue="size", size='size', data=tips)
Out[-]
<seaborn.axisgrid.FacetGrid at 0x1b6133d9ef0>
notion imagenotion image
sns.relplot(x="total_bill", y="tip", size="size", sizes=(15, 200), data=tips)
Out[-]
<seaborn.axisgrid.FacetGrid at 0x1b6139fbba8>
notion imagenotion image
fmri = sns.load_dataset("fmri") sns.relplot(x="timepoint", y="signal", hue="event", style="event", col="subject", col_wrap=3, height=10, aspect=.75, linewidth=2.5, kind="line", data=fmri.query("region == 'frontal'"));
Out[-]
notion imagenotion image

countplot()

  • 지정한 컬럼의 데이터가 얼마나 있는지 그리기
titanic = sns.load_dataset("titanic") f, ax = plt.subplots(figsize=(7, 3)) sns.countplot(y="deck", data=titanic, color="c") # 가로 그래프 생성 # y 대신 x를 사용하면 세로 그래프
Out[-]
<matplotlib.axes._subplots.AxesSubplot at 0x1b614314c88>
notion imagenotion image

pairplot()

  • 3차원 이상의 데이터에서 각 데이터끼리의 그래프를 그리기
titanic.head(10)
Out[-] survived pclass sex age sibsp parch fare embarked class who adult_male deck embark_town alive alone 0 0 3 male 22.0 1 0 7.2500 S Third man True NaN Southampton no False 1 1 1 female 38.0 1 0 71.2833 C First woman False C Cherbourg yes False 2 1 3 female 26.0 0 0 7.9250 S Third woman False NaN Southampton yes True 3 1 1 female 35.0 1 0 53.1000 S First woman False C Southampton yes False 4 0 3 male 35.0 0 0 8.0500 S Third man True NaN Southampton no True 5 0 3 male NaN 0 0 8.4583 Q Third man True NaN Queenstown no True 6 0 1 male 54.0 0 0 51.8625 S First man True E Southampton no True 7 0 3 male 2.0 3 1 21.0750 S Third child False NaN Southampton no False 8 1 3 female 27.0 0 2 11.1333 S Third woman False NaN Southampton yes False 9 1 2 female 14.0 1 0 30.0708 C Second child False NaN Cherbourg yes False
sns.pairplot(titanic, hue='class') plt.show()
Out[-]
notion imagenotion image

Plotly Python Open Source Graphing Library

Plotly

  • 인터랙티브한 그래프를 그리기에 적합한 패키지
  • 웹 시각화인 자바스크립트의 라이브러리 D3를 이용해 그래프가 웹에서 빠르게 그려진다.

Line

import plotly.express as px df = px.data.gapminder().query("country=='Canada'") fig = px.line(df, x="year", y="lifeExp", title='Life expectancy in Canada') fig.show()
Out[-]
notion imagenotion image

Scatter

fig = px.scatter(x=[0, 1, 2, 3, 4], y=[0, 1, 4, 9, 16]) fig.show()
Out[-]
notion imagenotion image

Bar

data_canada = px.data.gapminder().query("country == 'Canada'") fig = px.bar(data_canada, x='year', y='pop') fig.show()
Out[-]
notion imagenotion image
data_canada
Out[-] country continent year lifeExp pop gdpPercap iso_alpha iso_num 240 Canada Americas 1952 68.750 14785584 11367.16112 CAN 124 241 Canada Americas 1957 69.960 17010154 12489.95006 CAN 124 242 Canada Americas 1962 71.300 18985849 13462.48555 CAN 124 243 Canada Americas 1967 72.130 20819767 16076.58803 CAN 124 244 Canada Americas 1972 72.880 22284500 18970.57086 CAN 124 245 Canada Americas 1977 74.210 23796400 22090.88306 CAN 124 246 Canada Americas 1982 75.760 25201900 22898.79214 CAN 124 247 Canada Americas 1987 76.860 26549700 26626.51503 CAN 124 248 Canada Americas 1992 77.950 28523502 26342.88426 CAN 124 249 Canada Americas 1997 78.610 30305843 28954.92589 CAN 124 250 Canada Americas 2002 79.770 31902268 33328.96507 CAN 124 251 Canada Americas 2007 80.653 33390141 36319.23501 CAN 124

Pie

  • pull : Pie 차트의 조각을 분리함
import plotly print(plotly.__version__) #버전확인
Out[-] 4.5.4
# 버전이 낮으면 파이차트 불가능 # !pip3 install plotly
import plotly.express as px fig = px.pie(values=[1, 2, 3]) fig.show()
Out[-]
notion imagenotion image
import plotly.graph_objects as go labels = ['Oxygen','Hydrogen','Carbon_Dioxide','Nitrogen'] values = [4500, 2500, 1053, 500] fig = go.Figure(data=[go.Pie(labels=labels, values=values, pull=[0, 0, 0.2, 0])]) fig.show()
Out[-]
notion imagenotion image

Sunburst

  • 데이터의 계층 구조를 보기 쉽게 나타냄
import plotly.graph_objects as go fig =go.Figure(go.Sunburst( labels=["Eve", "Cain", "Seth", "Enos", "Noam", "Abel", "Awan", "Enoch", "Azura"], parents=["", "Eve", "Eve", "Seth", "Seth", "Eve", "Eve", "Awan", "Eve" ], values=[10, 14, 12, 10, 2, 6, 6, 4, 4], )) fig.update_layout(margin = dict(t=0, l=0, r=0, b=0)) fig.show()
Out[-]
notion imagenotion image

gantt

  • 일정관리를 위해서 bar형태로 만든 차트
import plotly.figure_factory as ff df = [ dict(Task='Morning Sleep', Start='2016-01-01', Finish='2016-01-01 6:00:00', Resource='Sleep'), dict(Task='Breakfast', Start='2016-01-01 7:00:00', Finish='2016-01-01 7:30:00', Resource='Food'), dict(Task='Work', Start='2016-01-01 9:00:00', Finish='2016-01-01 11:25:00', Resource='Brain'), dict(Task='Break', Start='2016-01-01 11:30:00', Finish='2016-01-01 12:00:00', Resource='Rest'), dict(Task='Lunch', Start='2016-01-01 12:00:00', Finish='2016-01-01 13:00:00', Resource='Food'), dict(Task='Work', Start='2016-01-01 13:00:00', Finish='2016-01-01 17:00:00', Resource='Brain'), dict(Task='Exercise', Start='2016-01-01 17:30:00', Finish='2016-01-01 18:30:00', Resource='Cardio'), dict(Task='Post Workout Rest', Start='2016-01-01 18:30:00', Finish='2016-01-01 19:00:00', Resource='Rest'), dict(Task='Dinner', Start='2016-01-01 19:00:00', Finish='2016-01-01 20:00:00', Resource='Food'), dict(Task='Evening Sleep', Start='2016-01-01 21:00:00', Finish='2016-01-01 23:59:00', Resource='Sleep') ] colors = dict(Cardio = 'rgb(46, 137, 205)', Food = 'rgb(114, 44, 121)', Sleep = 'rgb(198, 47, 105)', Brain = 'rgb(58, 149, 136)', Rest = 'rgb(107, 127, 135)') fig = ff.create_gantt(df, colors=colors, index_col='Resource', title='Daily Schedule', show_colorbar=True, bar_width=0.8, showgrid_x=True, showgrid_y=True) fig.show()
Out[-]
notion imagenotion image
import plotly.express as px fig =px.scatter(x=range(10), y=range(10)) fig.show() # fig.write_html("path/to/file.html")
Out[-]
notion imagenotion image
import plotly.express as px fig =px.scatter(x=range(10), y=range(10)) fig.show() # html로 저장 fig.write_html("file.html")
Out[-]
notion imagenotion image