## Tidy Data

• Certain Tables • There are three interrelated rules which make a dataset tidy
• Each variable must have its own column
• Each observation must have its own row
• Each value must have its own cell

• The first step is always to figure out what the variables and observations are.
• Sometimes it is easy & in othercases it might be difficult
• Typlically a dataset will from suffer from the followng problems
• one variable might be spread across multiple columns
• one observation might be scattered across multiple rows
• To fix this problem tidyr has two important functions
• `gather()`
• `spread()`

#### Gathering

• A common problem is a dataset where some of the column names are not names of the variables, rather values of variable
• In table4a, the column names 1999 and 2000 represent values of the year variable • Each row represents two observations
• To tidy a dataset like this we need to gather columns into a new pair of variables
• The name of the variable whose values form the columns name lets call it `key` and here it is `year`
• The name of the variable whose values are spread over sells, let call it as value & here it is cases • Now lets apply for table4b which has population • Spreading is opposite of gather. We use it when an observation is scattered across multiple rows
• view table2. An observatin is a country in a year, but the observation is spread across two rows  • Consider the following simple data

``````stocks <- tibble(
year = c(2015,2015, 2016,2016),
half = c(1,2,1,2),
return = c(1.88, 0.59, 0.92, 0.17)
)
``````

#### Seperate

• Seperate() pulls apart one column in mulitple columns   #### Unite

• `unite()` is inverse of seperate #### Sample Activity

• There is a who data set which reprsent Tuberculosis (TB) case broken down year by year, country, age, gender and diagnosis method
``````who
glimpse(who)
``````
• Lets try to gather columns form new_sp_m014 to newrel_f65 and ignore the NA values
``````who
glimpse(who)

who1 <- who %>%
gather(
new_sp_m014:newrel_f65,
key= "variant",
value="cases",
na.rm = TRUE)
print(who1)
``````
• Now, lets try to understand what the dataset is and do the further analysis.

This site uses Akismet to reduce spam. Learn how your comment data is processed. 