DataScience Classroomnotes 02/Jan/2022

Data Transformation with Dplyr contd

  • We can use transmute() to just keep the new variables
  • Grouped summaries with Summarize():
  • Summarize collapses a data frame into a single row
  • summarize() function is not useful until we pair with group_by()
  • Using the pipe operator makes it much cleaner
  • Using pipe operator try to show the count, average distance, average delay by destination
  • Using pipe operator try to fill the object not_cancelled where the condition is dep_delay is not na and arrival delay is not na && Summarize the mean of arrival delays and departure delays of not cancelled flights by month
  • Sample plotting
  • Some other useful stastical functions sd(x), IQR(), mad()
  • Lets try to summarize the flights data by number of flights per day
daily <- flights %>%
  group_by(year, month, day)

per_day <- daily %>%
  summarize(flights = n())

per_month <- per_day %>%
  summarize(flights = sum(flights))

per_year <- per_month %>%
  summarize(flights = sum(flights))
  • Refer Here for the changes done in the R file
  • Dplyr roots are in an earlier package called plyr, which implements split-apply-combine strategy for data analysis.
  • Dplyr has focus on data frames or in the tidyverse tibbles.
  • Install and load gapminder Refer Here
  • Exercise:
  • filter the data where lifeExp is less than 29 gapminder::gapminder %>% filter(lifeExp < 29)
  • filter the data where country is Rwnada or Afganishtan
  • Refer Here for the and Refer Here cheatset with Dplyr

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

About continuous learner

devops & cloud enthusiastic learner