## Utilities in R

• Consider the following functions
• rep()
• seq()
• For basic example • Exercise: Create a sequence from 1 to 500 by incrementing with 3 and assign this to seq1 and create a second sequence seq2 from 1500 to 100 decrementing by 7 and calculate sum of sequences
``````seq1 <- seq(from=1, to=500, by=3)
seq2 <- seq(from=1500, to=1000, by=-7)
sum(seq1, seq2)
``````
• When we have do search text data we would use regular expressions, In R language we have the following functions
• grepl() => returns TRUE if the pattern is found int he corresponding character string
• grep() => returns a vector of indices of the strings that contain patter
• Exercies:
``````emails <- c("qt@gmail.com", "qt@qt.com", "qt@live.in", "qt@qt.org", "qt@qt.edu")
``````
• Now lets search for email ids with “com” • Now print all the email ids which have “com” pattern
``````print(emails[grep("com", emails)])

# using pipes
library(magrittr)
myindex <- function(n, vec) {
vec[n]
}

grep("com", emails) %>% myindex(vec=emails)
``````
• Regular expressions
• ^ => begining
• \$ => end
• .* => this matches any character present zero or more times
• Lets write an regular expression based function to filter out only valid email ids
``````emails <- c("qt@gmail.com", "qt@qt.com", "qt@live.in", "qtqt.org", "qt@qtedu")
# print only valid email ids
print(emails[grep("^[a-zA-Z0-9_.-]+@[a-zA-Z0-9]+\\.[a-zA-z]+",emails)])
``````
• Sometimes, we might be replacing certain text which matches a pattern
• Consider the below emails, try to replace `.edu` with `.education`
``````emails <- c("qt@gmail.com", "qt@test.edu", "qt@live.in", "qt@qt.org", "qt@qt.edu")
``````
• To do these kind of activities we have two functions
• sub()
• gsub()
• Solution • Date Time formats
• %Y: 4 digit year
• %y: 2 digit year
• %m: 2-digit month
• %d: 2-digit day of month
• %A: weekday (Monday)
• %a: abbreviated weekday (Mon)
• %B: month (March)
• %b: abbreviated month (Mar)
• Samples
``````str1 <- "August 15, 1947"

date1 <- as.Date(str1, format="%B %d, %Y")
print(date1)

str2 <- "2012-27-05"
date2 <- as.Date(str2, format="%Y-%d-%m")
print(date2)
``````
• Time formats
• %H: hours as decimal number (00-23)
• %I: hours as decimal number (01-12)
• %M: minutes as decimal number
• %S: seconds as decimal number
• %T: short hand notation for “%H:%M:%S”
• %p: AM/PM indicator
• Try `?strptime`
• Look at the following sample
``````str3 <- "April 2, 11 hours:09 minutes:45 seconds:30 pm"
time3 <- as.POSIXct(str3, format="%B %d, %y hours:%I minutes:%M seconds:%S %p")
print(time3)
`````` This site uses Akismet to reduce spam. Learn how your comment data is processed. 