Data Transformation with Dplyr contd
- We can use transmute() to just keep the new variables
- Grouped summaries with Summarize():
- Summarize collapses a data frame into a single row
summarize()
function is not useful until we pair withgroup_by()
- Using the pipe operator makes it much cleaner
- Using pipe operator try to show the count, average distance, average delay by destination
- Using pipe operator try to fill the object not_cancelled where the condition is dep_delay is not na and arrival delay is not na && Summarize the mean of arrival delays and departure delays of not cancelled flights by month
- Sample plotting
- Some other useful stastical functions sd(x), IQR(), mad()
- Lets try to summarize the flights data by number of flights per day
daily <- flights %>%
group_by(year, month, day)
per_day <- daily %>%
summarize(flights = n())
print(per_day)
per_month <- per_day %>%
summarize(flights = sum(flights))
per_year <- per_month %>%
summarize(flights = sum(flights))
- Refer Here for the changes done in the R file
- Dplyr roots are in an earlier package called plyr, which implements split-apply-combine strategy for data analysis.
- Dplyr has focus on data frames or in the tidyverse tibbles.
- Install and load gapminder Refer Here
- Exercise:
- filter the data where lifeExp is less than 29
gapminder::gapminder %>% filter(lifeExp < 29)
- filter the data where country is Rwnada or Afganishtan
- Refer Here for the and Refer Here cheatset with Dplyr