March 16, 2012

z-statictics vs t-statistics

z-statistics use z-score which defines z-score as how many standard deviation away from the mean.

z-score = (µ-x)/σ

But, as µ and σ are essentially mean and standard deviation of population, not of sample, those have to be estimated. If, the sample size is large enough (usually at least greater than or equal to 30) µ is estimated as sample mean while σ is estimated with denominator (n-1) instead of n in the mathematical definition of standard deviation to avoid biasing. As I cannot write mathematical equations here, I am giving the link of wiki http://en.wikipedia.org/wiki/Unbiased_estimation_of_standard_deviation.

According to central limit theorem, standard deviation of the sample distribution of samples is estimated as s/√n, where s is the sample standard deviation. And then using this σ z-score is calculated and 78-95.4-99.7 rule is applied to calculate the desired probability.

But, when sample size is not large enough (< 30), then the estimation of σ is under-estimated. Consequently, z-score (or z-statistics) does not work. In this case, t-statistics is used.

In a summary, if sample size is large enough (>= 30), use z-statistics, otherwise use t-statistics.

Here is the video from Khan Academy.


No comments:

Post a Comment