What are R-Squared and Adjusted R-Squared?
Looking at the movement, of the stock exchange market, I get panic!! thanks to statisticians who came up with brilliant ideas to give the easy hacks to learn about stock predictions, and movements. If we really want to take decisions about the stock market, that one is interested, simple things required to learn those are some fundamental knowledge about statistics, and also about Error metrics with relevant information, which we are going to cover in this content.
Before understanding, about these concepts one has to be clear with important topic i.e“ Logistic Regression” which made me to pick the knowledge from InsideAIML provided content for facile understanding.
Let’s begin understanding most important concepts those are:
- What is R-Squared?
- How to calculate R-Squared?
- Limitations of R-Squared.
- What is Adjusted R-Squared?
- The Difference Between R-Squared and Adjusted R-Squared?
- Conclusion.
What is R-squared (R2)
In R-squared coefficient is the determination of statistical tool, which measures the degree to which a security of any performance, can be attributed to the performance of a particular benchmark index.
It is also Statistical measure, that represents the proportion of the variance for a dependent variable. that’s explained by an independent variable or variables in a regression model.
Whereas correlation explains the strength of the relationship, between an independent and dependent variable, R-squared explains to what extent the variance of one variable, explains the variance of the second variable. So, if the R2 of a model is 0.50, then approximately half of the observed variation can be explained by the model’s inputs.
“Higher the “R-squared” the more variation is explained by our input variables so are called good model”..
Note:-
Instead of working with many parameters, the easiest way to understand is to check the accuracy of the model by looking at the R-squared, directly from the statistical description/summary sheet from created project.
Let’s understand these points of R-squared, with the example of S&P portfolio performance.
- R-squared, does not quantify is the performance of the portfolio Itself, instead it shows, the correlation that exists between the performance of the portfolio and that of the benchmark index.
- R-squared is measured on the scale 1 to 100, the higher the value. the more the portfolio is been explained by performance, by the relevant index
- R-Square value of 100 indicates, the all movements of the security are being explained by movements in the index.
- R-squared for value index between 70 and 100 indicates a strong correlation between the real returns the portfolio, and the benchmark index value is below 70, which indicates medium correlated correlation.
- The security with R-squared means, the 90% of the securities price movement is explained by the index moment, this allows the investors to evaluate the relevant performance of the portfolio index, and predict how the portfolio may perform in the future, when compared against plotted market.
- For example, Mr.X seeks the potential stock of “S&P” portfolio whose performance is strongly correlated, those with R-squared value, stocks like A,B & C with R-squared values 85,95,and 45 respectively, then Mr.X will directly eliminate stock C(45). since the high R-squared value can be included with the technical indicator on the price chart for easy evaluation.
The summary provides two R-squared values, namely Multiple R-squared, and Adjusted R-squared.
The Multiple R-squared is calculated as follows:
Multiple R-squared = 1 — SSE/SST
where:
1. SSE is the sum of square of residuals.
Residual is the difference between, the predicted value and the actual value, and can be accessed by prediction Models residuals.
2. SST is the total sum of squares.
It is calculated by summing the squares of difference between the actual value and the mean value.
For example,
lets say that we have different models with the value of 5, 6, 7, and 8, and a model predicts the outcomes as 4.5, 6.3, 7.2, and 7.9. Then,
SSE can be calculated as:
SSE = (5–4.5) ^ 2 + (6–6.3) ^ 2 + (7–7.2) ^ 2 + (8–7.9) ^ 2;
Here it calculates the difference between the original value with the predicted values where the output as:
0.04,
0.09,
0.04,
0.01.
SST can be calculated as:
mean = (5 + 6 + 7 + 8) / 4 = 6.5;
SST = (5–6.5) ^ 2 + (6–6.5) ^ 2 + (7–6.5) ^ 2 + (8–6.5) ^ 2.
0.25,
0.25,
0.25,
2.25. respectively
So these were basic concepts, of error metrics but let’s work with adjusted R-squared.
limitations of R-squared.
However, the problem with R-squared is that it will either stay the same or increase with addition of more variables, even if they do not have any relationship with the output variables.
This is where “Adjusted R square” comes to help. Adjusted R-square penalizes you for adding variables which do not improve your existing model.
Hence, if you are building Linear regression on multiple variable, it is always suggested that you use Adjusted R-squared to judge goodness of model. In case you only have one input variable, R-square and Adjusted R squared would be exactly same.
Typically, the more non-significant variables you add into the model, the gap in R-squared and Adjusted R-squared increases.
What is Adjusted R-Squared?
The Adjusted R-squared value is similar to the Multiple R-squared value,
but it accounts for the number of variables. This means that the Multiple R-squared will always increase.
when a new variable is added to the prediction model, but if the variable is a non-significant one, the Adjusted R-squared value will decrease.
Here,
- n represents the number of data points in our dataset
- k represents the number of independent variables, and
- R represents the R-squared values determined by the model.
Difference between R-Squared VS Adjusted R-Squared?
R-Squared
- R-Squared shows how well terms (data points) fit a curve or line.
- An R-squared value of 1 means that it is a perfect prediction model.
Adjusted R-Squared
- Adjusted R-Squared is a special form of R-Squared, the coefficient of the determination.
- Adjusted R-squared is to judge goodness of model.
- Adjusted R-Squared also indicates how well terms fit a curve or line, but adjusts for the number of terms in a model.
- Adjusted R-square penalizes you for adding variables which do not improve your existing model.
Conclusion:-
We begin understanding about the concept of R-squared, and its characters. Then found, it keeps increases with increase in the variables, which is not the good sign for prediction of the movements, there comes with the concept of Adjusted R-Square, which helped us giving the goodness of the model, also helped penalizing the values, with the additional variables which improves the existing models.
Hope this gave the clear understanding about R-Squared and Adjusted R-Squared, in this concept.. to get more informative knowledge follow these related links which will enhance to upgrade one’s skills to get expertise.
https://saurabhmirgane007.medium.com/what-is-apriori-method-in-machine-learning-3ad4994d2ef00
https://insideaiml.com/article-details/Data-Science-is-dead.-Long-live-Business-Science-300
https://saurabhmirgane007.medium.com/what-is-unit-testing-in-python-c2d38d72e032
Thanking you.