DataScience Classroomnotes 07/Jan/2022

Filtering Joins

  • Filtering Joins match observations in the same way as mutating joins, but affect the observations, not the variables. There are two types:
  • semi_join(x,y): Keeps all observations in x that have a match in y
  • anti_join(x,y): drops all observations in x that have a match in y
  • Lets find top 10 destination in flights
# find top 10 destinations
top_dest <- flights %>%
  count(dest, sort=TRUE) %>%
  • Lets find each flight that went to one of those top 10 destinations
# using filter
flights %>%
  filter(dest %in% top_dest$dest)

* Instead we can use semi_join
* Lets try to know how many flights do we have which don that have matching planes

flights %>%
  anti_join(planes, by= "tailnum") %>%
  count(tailnum, sort = TRUE)

Set Operations

  • The final type of two-table verb are the set operations.
  • These expect x and y inputs to have the same variables and treat the observations as sets
  • intersect(x,y)
  • union(x,y)
  • setdiff(x,y)
  • Example
df1 <- tribble(
  ~x, ~y,
  1, 1,
  2, 1

df2 <- tribble(
  ~x, ~y,
  1, 1,
intersect(df1, df2)
union(df1, df2)
setdiff(df1, df2)
  • Note: Refer Here for the r files commited to git

Strings with stringr

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

About continuous learner

devops & cloud enthusiastic learner