## Classification

• Predict a class for a given instance based on set of features
• Whether or not an email is spam
• Whether a handwritten digit is 0,1,2,3,4,5,6,7,8,9
• Binary Classification:
• Classify an instance into one of two possible classes (email spam or not spam)
• Multicalss Classification:
• Classify an instance into more than two possible classess (Whether a handwritten digit is 0,1,2,3,4,5,6,7,8,9)

## Accuracy

• Accuracy = Number of correct predictions / Total number of predictions
• This is most commonly used metric for evaluating classification models
• This can be misleading metric & doesn’t alone tell the full story.
• Accuracy: Not Enough!
• Building a classifier to predict whether a patient has rare, fatal disease like cancer. Assumption is 0.1% of the population affected by disease
• If we reply with no cancer irrespective of data (tests, lab results etc) this is 99.9% accurate.
• Building a classifier to predict whether a patient has rare, fatal disease like cancer. Assumption is 0.1% of the population affected by disease
• Positive Case: Patient has disease
• Negative Case: Patient doesn’t have a disease

## Confusion Matrix

• A table where
• Rows Represent actual classes
• Colums Represent predicted classes
• Each entry is the number of instance with the corresponding actual and predicted classes • Accuracy: (TP + TN) / (FP + FN + TP + TN)
• How often is classifier correct?
• Precision: TP / (TP + FP)
• When Predicted positive, how often is classifer correct?
• Recall: TP / (TP + FN )
• How often are the positive instances classified correctly as positive?

## F1 Score

• Combines precision and recall into single metric • Interpreted as weighted average of precision and recall
• Its value will be between 0 (worst) and 1 (best)
• High only if both precision and recall are high
• Example:
• precision: 0.5, recall: 0.5, F1 = 0.5
• precision: 1.0, recall: 0.2, F1 = 0.5

## Lets try classification using titanic dataset     ## Importing libraries

``````library(tidyverse)
library(tidymodels)
library(skimr)
library(corrr)

# install package if not present
# install.packages('titanic')
library(titanic)
``````

## Lets use titanic dataset

``````# Lets split the titanic_train and build the model

data <- titanic_train
data_split <- initial_split(data)
train <- training(data_split)
test <- testing(data_split)

skimr::skim(train)
``````

### Build a recipe

``````data_rec <- recipe(Survived ~ ., train) %>%
step_mutate(Survived = ifelse(Survived  == 0, "Died", "Survived")) %>%
step_string2factor(Survived) %>%
step_rm(PassengerId, Name, Ticket, Cabin) %>%
step_meanimpute(Age) %>%
step_dummy(all_nominal(), -all_outcomes()) %>%
step_zv(all_predictors()) %>%
step_center(all_predictors(), -all_nominal()) %>%
step_scale(all_predictors(), -all_nominal())

``````

## prepping a recipe

``````data_prep <- data_rec %>%
prep()

data_prep
``````

# Build a fitted model

``````fitted_model <- logistic_reg() %>%
set_engine("glm") %>%
set_mode("classification") %>%
fit(Survived ~ ., data = bake(data_prep, train))
``````

# predict using the fitted model

``````
predictions <- fitted_model %>%
predict(new_data = bake(data_prep, test)) %>%
bind_cols(
bake(data_prep, test) %>%
select(Survived)
)

predictions
``````

## Create a confusion matrix

``````predictions %>%
conf_mat(Survived, .pred_class)
``````

# Metrics

``````predictions %>%
metrics(Survived, .pred_class)
``````
``````predictions %>%
precision(Survived, .pred_class)
``````
``````predictions %>%
recall(Survived, .pred_class)
``````
``````predictions %>%
f_meas(Survived, .pred_class)
``````

This site uses Akismet to reduce spam. Learn how your comment data is processed. ## About continuous learner

devops & cloud enthusiastic learner