DataScience Classroomnotes 12/Feb/2022

What is A Model?

  • The goal of a model is to provide a simple low-dimensional summary of the dataset
  • There are two parts to a model
  • Define a family of models that express a precise, but general, pattern that you want to capture
  • Generate a fitted model by finding the model from the family that is closest to your data.


  • A fitted model is just the closest model from the family of models
  • “Best” model according to some criteria
  • Does not imply that you have good model
  • Does not imply that the model is true
  • A goal of a model is not to uncover truth, but to discover a simple approximation which is still useful
  • “All models are wrong, but some are useful” – George Box

Quantify Distance

  • Need a way to quantify the distance between the data and a model
  • One option: To find the vertical distance between each point on the model
  • Predection: y values given by the Model
  • Response: Actual y values in data
  • Distance: Difference between prediction and response
  • Overall all distance: Collaps all the individual distances into a single number
    • Commonly used Method: Root Mean Squared Deviation

Activity: Finding Best Fitted Model

  • Here we will be using linear regression model, the basic idea behind this activity is to understand what is meant by building and evaluating models.
  • In this activity we would take simulated data
  • We create around 250 models with different slopes and intercepts
  • We try to find the best fitting model by choosing the distance between model and actual data by calculating root mean squared deviation

