What is A Model?
- The goal of a model is to provide a simple low-dimensional summary of the dataset
- There are two parts to a model
- Define a family of models that express a precise, but general, pattern that you want to capture
- Generate a fitted model by finding the model from the family that is closest to your data.
- A fitted model is just the closest model from the family of models
- “Best” model according to some criteria
- Does not imply that you have good model
- Does not imply that the model is true
- A goal of a model is not to uncover truth, but to discover a simple approximation which is still useful
- “All models are wrong, but some are useful” – George Box
- Need a way to quantify the distance between the data and a model
- One option: To find the vertical distance between each point on the model
- Predection: y values given by the Model
- Response: Actual y values in data
- Distance: Difference between prediction and response
- Overall all distance: Collaps all the individual distances into a single number
- Commonly used Method: Root Mean Squared Deviation
Activity: Finding Best Fitted Model
- Here we will be using linear regression model, the basic idea behind this activity is to understand what is meant by building and evaluating models.
- In this activity we would take simulated data
- We create around 250 models with different slopes and intercepts
- We try to find the best fitting model by choosing the distance between model and actual data by calculating root mean squared deviation