DataScience Classroomnotes 29/Mar/2022

Elements of Structured Data

  • Most common forms of structured data is a table with rows and columns.
  • There are two basic types of structured data
    • Numerical: Comes in two forms
      • Continuous
      • Discrete
    • Categorical: Takes only a fixed set of values:
      • Examples:
        • Types of TV Screens (plasma, LED, LCD etc)
        • State Names (Telangana, Andhra Pradesh, Tamil Nadu, Karnataka, Kerala)
      • Binary data is an important special case of categorical value which takes one out two values (0/2, yes/no, true/false)
      • Another form of categorical data is ordinal data i.e. categories which are ordered
        • Example: Ratings (1/5, 2/5, 3/5, 4/5, 5/5)

Rectangular Data

  • This is the general term for two-dimensional matirx with rows indicating records (cases) and columns indicating features (variables).
  • Dataframe is the format which we generally use in python (pandas) and R

Estimates of Location

  • Variables with measured or count data might have thousands of distinct values.
  • A basic step in exploring your data is getting a typical value for each feature: an estimated of where most of the data is located (i.e its central tendency)
  • Refer Here
  • Exercise: Try to take a dataset from kaggle Refer Here to calculate mean, median, weighted mean of total with ratings as weight and trimmed mean with any trim percentage.

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

About continuous learner

devops & cloud enthusiastic learner