Visualizations Contd
- Refer Here for the jupyter notebook, where we have explored matplotlib and seaborn visualizations
An Overview of Statistics
- Statistics is all about working with data, be it processing, analyzing or drawing a conclusion from the data we have.
- Statistics has two main goals
- describing the data
- drawing conclusions from it.
- These two goals coincide with two main categories of statistics
- descriptive statistics:
- Questions are asked about the general characteristics of a dataset (What is average price?, What is minimum value and max value)
- The answers to these questions (kind of) help us get an idea of what te dataset constitutes.
- inferential statistics:
- Goal is to go a step further: after gaining approximate insights from a given dataset. we’d like to use that information and infer on unknown data (Predictions for the future from observed data)
- This is typically done via various statistics and machine learning models.
- descriptive statistics:
Types of Data in Statistics
-
There are two main types of data:
- Categorical data
- Numerical data
-
Summary of Categorical and Numerical data
| Features | Categorical data | Numerical data |
| ——– | —————–| ————– |
| Characteristic | Discrete Values | Continuous Values |
| Ordinality | No | yes |
| Models | Categorical/discrete probability distributions | Continuous Probability distributions |
| Data Processing | One-hot encoding | Scaling and Normalizations |
| Descriptive Stats | Mode | Mean and standard deviation |
| Predective Modeling | Classification | Regression |
| Visualization Techniques | Pie Charts and Bar graphs | Histograms, line graphs and Scatter Plots |