练习题目来源:https://nbviewer.jupyter.org/github/schmit/cme193-ipython-notebooks-lecture/blob/master/Exercises.ipynb#Anscombe's-quartet
Part 1
For each of the four datasets...
- Compute the mean and variance of both x and y
- Compute the correlation coefficient between x and y
- Compute the linear regression line: y=β0+β1x+ϵy=β0+β1x+ϵ (hint: use statsmodels and look at the Statsmodels notebook)
对于四组数据,分别计算x,y的均值、方差、相关系数、线性回归方程(两个β值)
Part 2
Using Seaborn, visualize all four datasets.
hint: use sns.FacetGrid combined with plt.scatter
对四组数据进行可视化操作
代码实现:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
sns.set_context("talk")
anscombe = pd.read_csv('anscombe.csv')
#part 1
mx = anscombe.groupby('dataset').mean().x
my = anscombe.groupby('dataset').mean().y
vx = anscombe.groupby('dataset').var().x
vy = anscombe.groupby('dataset').var().y
print('x mean : \n', mx, '\n')
print('y mean : \n', my, '\n')
print('x var : \n', vx, '\n')
print('y var : \n', vy, '\n')
cor = anscombe.groupby('dataset').corr()
print('correlation : \n', cor, '\n')
for a in [anscombe[anscombe.dataset == i] for i in ['I', 'II', 'III', 'IV']]:
s_x = sm.add_constant(np.array(a.x))
s_y = np.array(a.y)
beta_pair = sm.OLS(s_y, s_x).fit()
print('β1, β0 = ', beta_pair.params)
#part 2
temp = sns.FacetGrid(data=anscombe, col='dataset', col_wrap=2)
temp.map(plt.scatter, 'x', 'y')
plt.show()
2018/6/11
版权声明:本文为ltc8600原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接和本声明。