DataScience Classroomnotes 23/Feb/2022

Linear Regression

  • A simple yet useful supervised learning approac for predicting quantitative (numeric) response
  • Makes prediction by simply computing a weighted sum of input feature, plus a contst called as bias term.
  • Finding Parameters
    • Choose parameters in a way so that prediction is close to actual values for the training samples
    • Define a cost function to find the parameters that minimize the cost function => MSE (Mean squared Error)
  • Some methods: Least Squares Method, Gradient Descent
  • Main Steps:
    • Use least-squares to fit a line to the data
    • Calculate R-Squared
    • Calculate p-value
  • Terminology:
    • R-Squared: a goodness of fit measure for linear regression modesl
    • Null Hypothesis: An initial statement claiming that there is no relationship between two measured events
    • P-Value: Tests the null hypothesis
    • Low p-value (< 0.05): Null hypothesis can be rejected
      • Predictor likelya meaningful addition to your model
      • Changes in the predictor’s values are related to changes in the response variable
    • Large p-value: Suggest that predictor not associated with changes in response

Tidymodels steps

  • Split data ({rsample})
  • prepare recipe ({recipes})
  • Specify model ({parsnip})
  • Tune hyperparameters ({tune})
  • Fit model ({parsnip})
  • Analyze model ({broom})
  • Predict ({parsnip})
  • Interpret the results ({yardstick})

Linear Regression with known dataset diamonds

  • Lets build the price predictor
    Preview
    Preview
    Preview
    Preview
  • Now lets find all the columns which have high correlation with price
    Preview
  • Now lets split the training and testing data from this data
    Preview
  • Now lets use lm to create the model for the training data
    Preview
  • broom package has a method to summarize models in a way they are easy to read broom::tidy(model)
    Preview
  • According to work which we have done so far carat, x, y, z can be used to predict the price of diamond, Lets use all the variables and see the results, y and z will be insignificant if we consider all variables
    Preview
    Preview

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

About continuous learner

devops & cloud enthusiastic learner