Menu

Sampling Distributions and Limit Theorems

LIMIT THEOREMS

Chapter 8, Sampling Distributions and Limit Theorems , delves into the crucial theoretical concepts that underpin almost all statistical inference. This chapter explains what happens to sample statistics, like the mean (Xˉ\bar{X}), when the sample size (nn) becomes very large, focusing on two foundational results: the Weak Law of Large Numbers (WLLN) and the Central Limit Theorem (CLT).


8.1 Setting the Stage: Multi-dimensional Continuous Variables

To study sample statistics, we first need a framework for discussing several random variables simultaneously. Since a sample {X1,X2,,Xn}\{X_1, X_2, \dots, X_n\} consists of nn random variables, understanding their relationships is essential.

Concepts of Joint Distributions

  1. Joint Distribution Function (FF): This function gives the probability that all variables fall below specified values: F(x1,x2,,xn)=P(X1x1,X2x2,,Xnxn)F(x_1, x_2, \dots, x_n) = P(X_1 \leq x_1, X_2 \leq x_2, \dots, X_n \leq x_n)

  2. Joint Density (ff): For continuous variables, probability is found by integrating the joint density function f(x1,,xn)f(x_1, \dots, x_n) over the desired region.

  3. Independence is Key: If the variables are mutually independent (a fundamental assumption for most random samples), their joint density function is simply the product of their individual marginal densities: f(x1,x2,,xn)=fX1(x1)fX2(x2)fXn(xn)f(x_1, x_2, \dots, x_n) = f_{X_1}(x_1) f_{X_2}(x_2) \cdots f_{X_n}(x_n)

Order Statistics

When we observe a sample, arranging the values from smallest to largest gives us the order statistics. X(1)X(2)X(n)X_{(1)} \le X_{(2)} \le \cdots \le X_{(n)}.

Q1

CDF of the Maximum

Question: If X1,,XnX_1, \dots, X_n are i.i.d. samples, what is the Cumulative Distribution Function (CDF) of the maximum value, X(n)X_{(n)}?

📝 View Detailed Solution

Solution: The maximum X(n)X_{(n)} is less than or equal to xx if and only if all XiX_i are less than or equal to xx. Due to independence: F(n)(x)=P(X(n)x)=i=1nP(Xix)=(F(x))nF_{(n)}(x) = P(X_{(n)} \leq x) = \prod_{i=1}^{n} P(X_i \leq x) = (F(x))^n


8.2 The Weak Law of Large Numbers (WLLN)

The WLLN is the first major limit theorem, providing formal confirmation of the common intuition that as you gather more data, the sample average gets closer to the true population average.

Concept: Convergence in Probability

💡

Convergence in Probability

The WLLN describes convergence in probability (p\xrightarrow{p}). A sequence of random variables XnX_n converges to XX in probability if, for any tiny distance ϵ\epsilon, the chance that XnX_n is further away from XX than ϵ\epsilon eventually goes to zero as nn increases: limnP(XnX>ϵ)=0\lim_{n \to \infty} P(|X_n - X| > \epsilon) = 0

The WLLN Theorem

If X1,X2,X_1, X_2, \dots are i.i.d. random variables with finite mean μ\mu and finite variance σ2\sigma^2, then the sample mean Xˉn\bar{X}_n converges in probability to μ\mu: limnP(Xˉnμ>ϵ)=0\lim_{n \to \infty} P(|\bar{X}_n - \mu| > \epsilon) = 0

Proof Insight: Applying Chebyshev’s Inequality to Xˉn\bar{X}_n: P(Xˉnμ>ϵ)Var[Xˉn]ϵ2=σ2nϵ2P(|\bar{X}_n - \mu| > \epsilon) \leq \frac{Var[\bar{X}_n]}{\epsilon^2} = \frac{\sigma^2}{n\epsilon^2} As nn \to \infty, the upper bound σ2/(nϵ2)\sigma^2/(n\epsilon^2) goes to zero.

Application: Sample Proportion

The WLLN formally validates the intuitive link between theoretical probability and observed frequency. The sample proportion (p^\hat{p}) converges to the true probability (pp).


8.3 Convergence in Distribution

Convergence in distribution (d\xrightarrow{d}) describes how the shape of the distribution of a random variable sequence approaches a limiting shape.

The Power of Moment Generating Functions (MGFs)

The most practical way to prove convergence in distribution is often through Moment Generating Functions (MGFs).

MGF Convergence Theorem: If the MGFs Mn(t)M_n(t) of a sequence XnX_n exist and converge to M(t)M(t), and M(t)M(t) is the MGF of some random variable XX, then XnX_n converges in distribution to XX.

Example: Proving that Binomial approaches Poisson when nn \to \infty and p0p \to 0 (np=λnp = \lambda) is done by showing the limit of the Binomial MGF equals the Poisson MGF eλ(et1)e^{\lambda(e^t - 1)}.


8.4 The Central Limit Theorem (CLT)

The Central Limit Theorem is arguably the most fundamental result in statistics, explaining why the Normal distribution appears so frequently.

The CLT Theorem

💡

Central Limit Theorem

Let X1,X2,X_1, X_2, \dots be i.i.d. random variables with finite mean μ\mu and finite variance σ2\sigma^2. If we standardize the sample mean Xˉn\bar{X}_n:

Yn=n(Xˉnμ)σ=SnnμσnY_n = \frac{\sqrt{n}(\bar{X}_n - \mu)}{\sigma} = \frac{S_n - n\mu}{\sigma \sqrt{n}}

Then, as nn \to \infty, YnY_n converges in distribution to the Standard Normal distribution, ZNormal(0,1)Z \sim \text{Normal}(0, 1).

n(Xˉnμ)σdZ\frac{\sqrt{n}(\bar{X}_n - \mu)}{\sigma} \xrightarrow{d} Z

Insight: The sampling distribution of the sample mean approaches a Normal distribution, even if the original population distribution is highly non-normal.

Application: Normal Approximation

Q2

Normal Approximation

Question: YGamma(100,4)Y \sim \text{Gamma}(100, 4) is the sum of n=100n=100 independent Exponential(4)\text{Exponential}(4) random variables. Approximate P(20<Y30)P(20 < Y \leq 30).

📝 View Detailed Solution

Solution:

  1. Identify Parameters: For Exponential(4), μ=0.25\mu = 0.25, σ=0.25\sigma = 0.25.
  2. Sum Parameters: E[S100]=100(0.25)=25E[S_{100}] = 100(0.25) = 25 and SD[S100]=0.25100=2.5SD[S_{100}] = 0.25\sqrt{100} = 2.5.
  3. Standardize: P(20<S10030)P(20252.5<Z30252.5)=P(2<Z2)P(20 < S_{100} \leq 30) \approx P\left( \frac{20 - 25}{2.5} < Z \leq \frac{30 - 25}{2.5} \right) = P(-2 < Z \leq 2)
  4. Calculate: P(2<Z2)0.9544P(-2 < Z \leq 2) \approx \mathbf{0.9544}.

Continuity Correction

When approximating discrete, integer-valued RVs (like Binomial), extend the interval by 0.5 units: P(a<Snb)P(a+0.5nμσn<Zb+0.5nμσn)P(a < S_n \leq b) \approx P\left( \frac{a + 0.5 - n\mu}{\sigma \sqrt{n}} < Z \leq \frac{b + 0.5 - n\mu}{\sigma \sqrt{n}} \right)

Race Car Analogy: The Weak Law of Large Numbers tells you the race car (the sample mean) will eventually converge and finish the race at the true mean (μ\mu). The Central Limit Theorem describes how the car travels to that finish line, showing that its distribution around the finish line (when standardized) always follows the same predictable, bell-shaped path (the Normal distribution), regardless of how the race started.