Data Science Classroom Series – 14/Dec/2021

Factor Vectors

  • Factors are an important concept in R when building models Preview
  • Lets try to create height, weight and gender of sample data Preview

Missing Data

  • Missing data plays a critical role in both statistics and computing.
  • R has two types of missing data NA and NULL.
  • While they are similar they behave differently and we need to understand
  • NA:
    • Often we will have data has missing values for any reasons. Statistical Programs use various techniquest to represent missing data such dash, period etc.
    • R uses NA Preview Preview
  • NULL:
    • NULL is abscene of anything Preview

Pipes

  • A new paradigm for calling functions in R is the pipe.
  • The pipe from the magrittr package works by taking the value or object on the left-hand side of the pipe and inserting it into the first argument of the function on the right handside of the pipe.
  • Pipe can be created by the operator %>%
<expression|function> %>% <function>
  • Example
library(magrittr)
1:10 %>% mean

Preview

Data frames

  • This is most widely used feature of R.

  • data.frame is a data structure much like an Excel Spreadsheet in that it has rows and columns

  • On the surface data.frame is just like an Excel spread sheet, In terms of how R organizes data.frame is each column is acutally a vector Preview Preview

  • nrow => number of rows

  • ncol => number of columns

  • dim => dimension which shows both rows and columns in one result Preview

  • Changing column names and row names Preview Preview

  • Resetting the names to default names Preview

  • To reset the column names donot use NULL as you will loose column Names Preview

  • usage of head and tail for long datasets Preview

  • Consider this data frame Preview

  • See all the values of sports column Preview Preview Preview Preview

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

About learningthoughtsadmin