DataScience Classroomnotes 02/Apr/2022

Data & Sampling Distributions

  • Population
  • Sample
  • A sample is a subest of data from a larger data set which is called the population.
  • Random Sampling is a process in which each available member of the populate being sampled has an equal chance of being chosen for the sample at each draw. The sample that results is called as simple random sample.
  • Sampling can be done with replacement, in which observations are put back in the population after each draw for future reselection or it can be done without replacement.
  • Bias comes in different forms and maybe observable or invisible. When a result does suggest a bias it is often an indicator that statistical or machin learning model has been misspecified

  • Sampling Distribution: This term refers to the distribution of simple statistic over many samples drawn from the same population.

  • Standard Error:
    • This is a single metric that sumps up the varaiblity in teh sample distribution of a statistic.
    • Standard error = SE = s/√n
    • s = The standard deviation
    • n = sample size
    • The standard error indicates how different the population mean is likely to be from a sample mean.
    • A higher standard error showns that the sample means are widely spread around population mean i.e. your sample many not closely represent your population.
    • A low standard error shows that sample means are closely distributed around the population mean & is representative of your population
    • Refer Here

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

About continuous learner

devops & cloud enthusiastic learner