Menu

Sampling and Repeated Trials

SAMPLING & REPEATED TRIALS

Chapter 2 of the “Stats 2 book” introduces the crucial concepts of repeated trials and sampling , providing the mathematical framework for analysing sequences of experiments where the outcomes influence each other or are drawn from a limited pool. This framework builds upon the basic concepts of probability space and events discussed in Chapter 1.

The chapter primarily focuses on three major statistical distributions that arise from repeating experiments: Bernoulli Trials (leading to Binomial and Geometric distributions), the Poisson Approximation, and the Hypergeometric Distribution (for sampling without replacement).

A

Probability
Models

Bernoulli, Binomial,
& Poisson

A

2.1 Bernoulli Trials: Success, Failure, and Repetition

Bernoulli trials form the foundation for modelling repetitive experiments that yield a binary outcome.

Concept Explanation: Bernoulli Trial

💡

Bernoulli Trial

A single Bernoulli trial is an experiment where the outcome is classified strictly as either a “Success” or a “Failure”.

  • Let pp be the probability of success on any given trial.
  • The trials in a sequence are always assumed to be independent.
ExperimentSuccess Event ExampleProbability (pp)
Toss a fair coinHead appears1/21/2
Roll a dieSix appears1/61/6

If we denote a single trial with parameter pp as Bernoulli(p)Bernoulli(p), we define the sample space S={success, failure}S = \{\text{success, failure}\} where P({success})=pP(\{\text{success}\}) = p.

The Binomial Distribution (Counting Successes)

When you perform nn independent Bernoulli trials and you are interested in the total number of successes (kk) observed, the resulting distribution is the Binomial Distribution (Binomial(n,p)Binomial(n, p)).

Binomial Probability


P(Bk)=(nk)pk(1p)nkP(B_k) = \binom{n}{k} p^k (1-p)^{n-k}


For 0kn0 \le k \le n

Interpretation

Probability of exactly k successes in n trials.

(nk)\binom{n}{k} counts the arrangements.

pk(1p)nkp^k (1-p)^{n-k} is the probability of one specific sequence.

Example Question and Concept Application (Mode)

Q1

Free Throw Mode

Question: If a basketball player, Mark, is a p=0.7p=0.7 free-throw shooter and attempts n=10n=10 independent free throws, what is the most likely number of shots he will make?

📝 View Detailed Solution

Concept Explanation (Mode): The “mode” is the value of kk that makes the probability P(Bk)P(B_k) largest. For a Binomial(n,p)Binomial(n, p) distribution, the most likely number of successes is the integer kk given by the formula: k=p(n+1)k = \lfloor p(n+1) \rfloor.

Solution: Using p=0.7p=0.7 and n=10n=10: k=0.7(10+1)=0.7×11=7.7k = \lfloor 0.7(10 + 1) \rfloor = \lfloor 0.7 \times 11 \rfloor = \lfloor 7.7 \rfloor The most likely number of successful throws is 7.


The Geometric Distribution (Waiting for Success)

The Geometric Distribution (Geometric(p)Geometric(p)) describes the probability that the first success occurs exactly on the kthk^{\text{th}} trial.

K

Geometric
Prop

The only discrete
memoryless dist.

K
Memoryless
💡

Geometric Probability

For the first success to occur on the kthk^{\text{th}} trial, the first k1k-1 trials must all be failures, and the kthk^{\text{th}} trial must be a success.

P(Ck)=(1p)k1pfor k1\mathbf{P(C_k) = (1-p)^{k-1} p} \quad \text{for } k \ge 1

Key Property: The Geometric distribution is memoryless. This means that if an event (like finding a head on a coin toss) hasn’t occurred by trial nn, the probability that it takes mm more trials is the same as the probability it would have taken mm trials from the start.

Example Question and Solution

Q2

Rolling a Six

Question: A fair die is rolled repeatedly. What is the probability that the first 6 appears exactly on the fifth roll?

📝 View Detailed Solution

Concept: Success is rolling a 6 (p=1/6p=1/6). We are looking for k=5k=5.

Solution: The probability of failure is 11/6=5/61 - 1/6 = 5/6. P(first 6 on roll 5)=(56)5116=6257776P(\text{first 6 on roll 5}) = \left(\frac{5}{6}\right)^{5-1} \cdot \frac{1}{6} = \frac{625}{7776}


2.2 Poisson Approximation: When nn is Large and pp is Small

Calculating Binomial probabilities directly becomes computationally challenging when the number of trials (nn) is very large. The Poisson Distribution serves as an effective approximation in cases where nn is large and the probability of success (pp) is very small, provided the product λ=np\lambda = np remains constant.

Poisson Formula


P({k})=eλλkk!P(\{k\}) = \frac{e^{-\lambda} \lambda^k}{k!}


where λ=np\lambda = np (Average Rate)

Limit Case

Approximation for Binomial when:

  • nn \to \infty
  • p0p \to 0
  • λ=np\lambda = np constant

Example Question and Solution (Approximation)

Q3

Independence Day Births

Question: A college has n=1460n=1460 students. Assuming a birthrate probability p=1/365p=1/365, what is the probability that five or more students were born on Independence Day?

📝 View Detailed Solution

Concept: Since nn is large (1460) and pp is small (1/3651/365), we use the Poisson approximation.

  1. Calculate λ\lambda: λ=np=1460×(1/365)=4\lambda = np = 1460 \times (1/365) = 4.
  2. The problem asks for P(X5)P(X \ge 5), where XPoisson(4)X \sim Poisson(4).

Solution: It is easier to calculate the complement 1P(X4)1 - P(X \le 4). P(X5)1k=04P(X=k)=1[e4400!+e4411!+e4422!+e4436+e44424]P(X \ge 5) \approx 1 - \sum_{k=0}^4 P(X=k) = 1 - \left[ \frac{e^{-4} 4^0}{0!} + \frac{e^{-4} 4^1}{1!} + \frac{e^{-4} 4^2}{2!} + \frac{e^{-4} 4^3}{6} + \frac{e^{-4} 4^4}{24} \right] This calculation yields an approximate probability of 0.3711631.


2.3 Sampling: With and Without Replacement

The methods used for Bernoulli trials (Binomial distribution) assume independence, which implies “sampling with replacement” (i.e., selecting an item and making it available for future selection).

However, in many real-world scenarios, sampling is done without replacement, meaning that once an item is selected, it is removed from the population pool, and subsequent trials are therefore dependent on previous outcomes.

The Hypergeometric Distribution (Sampling Without Replacement)

The Hypergeometric Distribution (HyperGeo(N,r,m)HyperGeo(N, r, m)) models the number of successes (kk) found in a sample when sampling without replacement from a finite population.

  • NN: Total population size.
  • rr: Number of individuals with the characteristic (successes) in NN.
  • mm: Sample size chosen.
  • kk: Number of successes found in the sample.

Concept: Hypergeometric Probability

💡

Hypergeometric Formula

The probability P({k})P(\{k\}) is calculated by counting the number of ways to select kk successes from rr available items, and mkm-k failures from NrN-r available items, divided by the total number of ways to choose mm items from NN.

P({k})=(rk)(Nrmk)(Nm).\mathbf{P(\{k\}) = \frac{\binom{r}{k} \binom{N-r}{m-k}}{\binom{N}{m}}} \quad \text{.}

Example Question and Solution (Town Residents)

Q4

Town Demographics

Question: In a town of N=5000N=5000 residents, r=1000r=1000 are under age 18. If m=4m=4 residents are selected randomly without replacement, what is the probability that exactly k=2k=2 of them will be under 18?

📝 View Detailed Solution

Solution: We use the Hypergeometric formula with N=5000N=5000, r=1000r=1000, m=4m=4, and k=2k=2. The number of non-successes is Nr=4000N-r=4000.

P({2})=(10002)(40002)(50004)P(\{2\}) = \frac{\binom{1000}{2} \binom{4000}{2}}{\binom{5000}{4}} This result is approximately 0.153592.

Approximation Note:

  • Hypergeometric Result: 0.153592\approx 0.153592
  • Binomial Approximation Result: P(k=2)=(42)(1/5)2(4/5)2=0.1536P(k=2) = \binom{4}{2} (1/5)^2 (4/5)^2 = 0.1536