## What is A Model?

• The goal of a model is to provide a simple low-dimensional summary of the dataset
• There are two parts to a model
• Define a family of models that express a precise, but general, pattern that you want to capture
• Generate a fitted model by finding the model from the family that is closest to your data.

## CAVEAT

• A fitted model is just the closest model from the family of models
• “Best” model according to some criteria
• Does not imply that you have good model
• Does not imply that the model is true
• A goal of a model is not to uncover truth, but to discover a simple approximation which is still useful
• “All models are wrong, but some are useful” – George Box

## Quantify Distance

• Need a way to quantify the distance between the data and a model
• One option: To find the vertical distance between each point on the model
• Predection: y values given by the Model
• Response: Actual y values in data
• Distance: Difference between prediction and response
• Overall all distance: Collaps all the individual distances into a single number
• Commonly used Method: Root Mean Squared Deviation

## Activity: Finding Best Fitted Model

• Here we will be using linear regression model, the basic idea behind this activity is to understand what is meant by building and evaluating models.
• In this activity we would take simulated data
• We create around 250 models with different slopes and intercepts
• We try to find the best fitting model by choosing the distance between model and actual data by calculating root mean squared deviation

devops & cloud enthusiastic learner