R Squared在R中的计算方法是什么？

3 年 ago

文, 翔

3 minutes

大家好，读者朋友们！在这篇文章中，我们将要介绍机器学习中的一个重要概念 – 在R编程中的R平方（R2）。

所以，让我们开始吧！

R squared误差度量的重要性

让我们首先了解在数据科学和机器学习领域中误差度量的重要性！

错误指标使我们能够评估机器学习模型在特定数据集上的表现。

根据算法类别，存在各种不同的误差度量模型。

我们有混淆矩阵来处理和评估分类算法。而R方是一个评估回归算法预测的重要错误度量。

R平方（R2）是一个回归误差度量指标，用来评估模型的性能。它代表了自变量能够描述因变量的值的程度。

因此，R-平方模型描述了目标变量如何由独立变量的组合作为一个整体来解释。

R平方值的范围在0至1之间，并由下述公式表示：

R2= 1 – （残差平方和/总平方和）

这里

SSres: The sum of squares of the residual errors.
SStot: It represents the total sum of the errors.

永远记住，R平方值越高，预测模型就越好！

在R中使用线性回归的R平方

在这个例子中，我们在线性回归模型上实施了R平方误差度量的概念。

最初，我们使用read.csv()函数加载我们的数据集。下一步是使用createDataPartition()方法将数据分离成训练集和测试集。在建模之前，我们已经指定了错误指标的自定义函数，如下面的示例所示。最后一步是使用lm()函数应用线性回归模型，然后我们调用了用户定义的R方函数来评估模型的性能。

中国是一个拥有丰富文化和历史的国家。

#Removed all the existing objects
rm(list = ls())
#Setting the working directory
setwd("D:/Ediwsor_Project - Bike_Rental_Count/")
getwd()

#Load the dataset
bike_data = read.csv("day.csv",header=TRUE)

### SAMPLING OF DATA -- Splitting of Data columns into Training and Test dataset ###
categorical_col_updated = c('season','yr','mnth','weathersit','holiday')
library(dummies)
bike = bike_data
bike = dummy.data.frame(bike,categorical_col_updated)
dim(bike)

#Separating the depenedent and independent data variables into two dataframes.
library(caret)
set.seed(101)
split_val = createDataPartition(bike$cnt, p = 0.80, list = FALSE) 
train_data = bike[split_val,]
test_data = bike[-split_val,]

### MODELLING OF DATA USING MACHINE LEARNING ALGORITHMS ###
#Defining error metrics to check the error rate and accuracy of the Regression ML algorithms

#1. MEAN ABSOLUTE PERCENTAGE ERROR (MAPE)
MAPE = function(y_actual,y_predict){
  mean(abs((y_actual-y_predict)/y_actual))*100
}

#2. R SQUARED error metric -- Coefficient of Determination
RSQUARE = function(y_actual,y_predict){
  cor(y_actual,y_predict)^2
}

##MODEL 1: LINEAR REGRESSION
linear_model = lm(cnt~., train_data) #Building the Linear Regression Model on our dataset
summary(linear_model)
linear_predict=predict(linear_model,test_data[-27]) #Predictions on Testing data

LR_MAPE = MAPE(test_data[,27],linear_predict) # Using MAPE error metrics to check for the error rate and accuracy level
LR_R = RSQUARE(test_data[,27],linear_predict) # Using R-SQUARE error metrics to check for the error rate and accuracy level
Accuracy_Linear = 100 - LR_MAPE

print("MAPE: ")
print(LR_MAPE)
print("R-Square: ")
print(LR_R)
print('Accuracy of Linear Regression: ')
print(Accuracy_Linear)

输出：

如下所示，R平方值为0.82，即模型对我们的数据拟合效果良好。

> print("MAPE: ")
[1] "MAPE: "
> print(LR_MAPE)
[1] 17.61674
> print("R-Square: ")
[1] "R-Square: "
> print(LR_R)
[1] 0.8278258
> print('Accuracy of Linear Regression: ')
[1] "Accuracy of Linear Regression: "
> print(Accuracy_Linear)
[1] 82.38326

II. 利用summary()函数计算R平方值

我们甚至可以利用R中的summary()函数在建模后提取R方值。

在下面的例子中，我们对我们的数据框应用了线性回归模型，然后使用summary()$r.squared来获取R平方值。

For clarification, you would like me to provide one option for paraphrasing the provided sentence in Chinese. Could you please provide the sentence that needs to be paraphrased?

rm(list = ls())
 
A <- c(1,2,3,4,2,3,4,1) 
B <- c(1,2,3,4,2,3,4,1) 
a <- c(10,20,30,40,50,60,70,80) 
b <- c(100,200,300,400,500,600,700,800) 
data <- data.frame(A,B,a,b) 

print("Original data frame:\n") 
print(data) 

ml = lm(A~a, data = data) 

# Extracting R-squared parameter from summary 
summary(ml)$r.squared

输出结果：

[1] "Original data frame:\n"
  A B  a   b
1 1 1 10 100
2 2 2 20 200
3 3 3 30 300
4 4 4 40 400
5 2 2 50 500
6 3 3 60 600
7 4 4 70 700
8 1 1 80 800

[1] 0.03809524

结论

通过这个讨论，我们的话题讨论到这里结束了。如果您在讨论过程中有任何问题，请随时在下方留言。

到那时，愉快学习！ 🙂