多元线性回归模型评估_评估线性回归模型的指标

多元线性回归模型评估

You’ve got a sample dataset and just finished working on a machine learning algorithm using the linear regression model. But now, you are wondering whether or not your analysis and prediction of the data are accurate, statistically significant, and provides relevant insights needed to solve the problem.

您已经有了一个样本数据集,并使用线性回归模型完成了机器学习算法的研究。 但是现在,您想知道对数据的分析和预测是否准确,具有统计意义,并提供解决问题所需的相关见解。

There are a number of metrics used in evaluating the performance of a linear regression model. They include:

在评估线性回归模型的性能时使用了许多指标。 它们包括:

  • R-Squared: seldom used for evaluating model fit

    R平方:很少用于评估模型拟合

  • MSE (Mean Squared Error): used for evaluating model fit

    MSE(均方误差):用于评估模型拟合

  • RMSE (Root Mean Squared Error): always used for evaluating model fit

    RMSE(均方根误差):始终用于评估模型拟合

Let us take a look at each of these metrics, shall we?

让我们看看这些指标中的每一个,对吗?

R-SQUARED:

R平方

  • is also known as the coefficient of determination

    也称为确定系数
  • measures the percentage of variation in the response (dependent) variable explained by the predictor in the predictor (independent) variable.

    测量预测变量(独立变量)中预测变量解释的响应(因变量)变化的百分比。
  • has values between 0 and 1 for every single regression. Where values between 0.3 and 0.5 refer to a weak r-squared, 0.5 and 0.7 refers to a moderate r-squared, and values > 0.7 refer to a strong r-squared.

    每一次回归的值都在0到1之间。 其中介于0.3和0.5之间的值表示弱r平方介于0.5和0.7之间的表示中等r平方,大于0.7的表示强r平方。

  • values > 0.7 means that 70% of the variation is around its mean

    值> 0.7表示70%的变化均在其平均值附近
  • the higher the r-squared, the better the model fits your data (there is a caveat to this…)because there is a possibility of having a low r-squared value for a good model and vice-versa

    r平方越高,则模型越适合您的数据(对此有警告)…,因为对于一个好的模型而言,r平方值可能较低,反之亦然

  • is a relative measure of model fit. This means they are not a good measure to determine how well a model fits the data.

    是模型拟合的相对度量。 这意味着它们并不是确定模型拟合数据的好方法。
  • is sometimes considered as statistically insignificant.

    有时被认为在统计上微不足道。
  • sklearn module : sklearn.metrics.r2_score

    sklearn模块: sklearn.metrics. r2_score sklearn.metrics. r2_score

  • mathematical formula:

    数学公式:
Image for post
R-Squared Formula
R平方公式

Mean Squared Error (MSE):

均方误差(MSE):

  • measures the average of the squared difference between the observed value and the actual value.

    测量观察值与实际值之间平方差的平均值。
  • is an absolute measure of model fit.

    是模型拟合的绝对度量。
  • a value of 0 indicates a perfect fit, this means that the data predict the outcome accurately, however in most cases, it is hardly ever so.

    值为0表示完美契合,这意味着数据可以准确地预测结果,但是在大多数情况下,很难做到这一点。
  • sklearn module: sklearn.metrics.mean_squared_error

    sklearn模块: sklearn.metrics. mean_squared_error sklearn.metrics. mean_squared_error

  • mathematical formula:

    数学公式:
Image for post
MSE Formula
MSE公式

It is important to understand that

重要的是要了解

Image for post
Residuals
残差

Residuals:

残留物:

  • is the difference between the actual value and the predicted value

    是实际值与预测值之差
  • used to check the validity of a model and if assumptions or hypothesis are to be considered

    用于检查模型的有效性以及是否要考虑假设或假设
  • should be random (i.e has no pattern)

    应该是随机的(即没有模式)
  • example of a good residual is a scatter plot with residuals centered around 0

    良好残差的示例是散点图,残差的中心位于0附近
  • statsmodels module: RegressionResults.resid

    statsmodels模块: RegressionResults.resid

Root Mean Squared Error (RMSE):

均方根误差(RMSE):

  • is the measure of the distance between the actual values and the predicted value

    是实际值与预测值之间的距离的量度
  • the lower the RMSE the better the measure of fit. This means that there is little variation in the spread of data

    RMSE越低,拟合度越好。 这意味着数据传播几乎没有变化
  • is a good measure of how accurately the model predicts the target

    是衡量模型预测目标的准确性的好方法
  • is considered the best statistics to determine the relationship between the model and the response variable

    被认为是确定模型与响应变量之间关系的最佳统计数据
  • represents 1-Standard Deviation (residuals) between the actual value and the predicted values

    表示实际值和预测值之间的1-标准偏差(残差)
  • it measures the spread of the data points from the regression line.

    它从回归线测量数据点的分布。
  • using sklearn and math module to perform RMSE

    使用sklearn和数学模块执行RMSE
rmse.py
rmse.py
  • mathematical formula:

    数学公式:
Image for post
RMSE formula
RMSE公式

It is advisable to have an in-depth knowledge of statistics in order to familiarize yourself with concepts and models used in Data Science. Not sure where to start, this article should give you a headstart into the field of statistics.

建议您具有深入的统计知识,以熟悉数据科学中使用的概念和模型。 不确定从哪里开始, 本文应该为您提供进入统计领域的先机。

It is important to note that these metrics only apply in a regression model and not on a classification model. There are other performance measures that can be employed. I recently worked on a project (red wine quality dataset) and used some of the above metrics to evaluate the performance of my model. Can you tell if this metric performed well or poorly on the problem dataset and why?

重要的是要注意,这些指标仅适用于回归模型,不适用于分类模型。 还有其他可采用的性能指标。 我最近从事一个项目(红酒质量数据集),并使用上述一些指标来评估我的模型的性能。 您能否说出该指标在问题数据集上的表现好坏,为什么?

Now you know how to work effectively with your dataset using the linear regression model. Thank you for taking the time out to read.

现在,您知道了如何使用线性回归模型有效地使用数据集。 感谢您抽出宝贵的时间阅读。

翻译自: https://medium.com/dev-genius/metrics-for-evaluating-linear-regression-models-36df305510d9

多元线性回归模型评估