Linear Regression
- A simple yet useful supervised learning approac for predicting quantitative (numeric) response
- Makes prediction by simply computing a weighted sum of input feature, plus a contst called as bias term.
- Finding Parameters
- Choose parameters in a way so that prediction is close to actual values for the training samples
- Define a cost function to find the parameters that minimize the cost function => MSE (Mean squared Error)
- Some methods: Least Squares Method, Gradient Descent
- Main Steps:
- Use least-squares to fit a line to the data
- Calculate R-Squared
- Calculate p-value
- Terminology:
- R-Squared: a goodness of fit measure for linear regression modesl
- Null Hypothesis: An initial statement claiming that there is no relationship between two measured events
- P-Value: Tests the null hypothesis
- Low p-value (
< 0.05
): Null hypothesis can be rejected- Predictor likelya meaningful addition to your model
- Changes in the predictor’s values are related to changes in the response variable
- Large p-value: Suggest that predictor not associated with changes in response
Tidymodels steps
- Split data ({rsample})
- prepare recipe ({recipes})
- Specify model ({parsnip})
- Tune hyperparameters ({tune})
- Fit model ({parsnip})
- Analyze model ({broom})
- Predict ({parsnip})
- Interpret the results ({yardstick})
Linear Regression with known dataset diamonds
- Lets build the price predictor
- Now lets find all the columns which have high correlation with price
- Now lets split the training and testing data from this data
- Now lets use
lm
to create the model for the training data
- broom package has a method to summarize models in a way they are easy to read
broom::tidy(model)
- According to work which we have done so far carat, x, y, z can be used to predict the price of diamond, Lets use all the variables and see the results, y and z will be insignificant if we consider all variables