Menu

Summarizing Discrete Random Variables

SUMMARIZING DISCRETE RVs

Chapter 4 of the “Stats 2 Book”, Summarizing Discrete Random Variables , focuses on deriving numerical characteristics—like the average and the spread—that summarize the behaviour of a random variable (RV) without listing every possible outcome. These summaries allow you to understand the “typical” result and the likelihood of unusual results.

The chapter covers five main concepts: Expected Value (Average), Variance and Standard Deviation (Spread), Conditional Expectation, and Covariance and Correlation (Relationships between RVs).


4.1 Expected Value (E[X]E[X]): The Average Outcome

The expected value (E[X]E[X]), or average, of a discrete random variable XX is essentially a long-run weighted average of all its possible outcomes. It tells you what value you should expect, on average, if you repeated the experiment many times.

Concept and Calculation

💡

Expected Value Formula

The calculation of the expected value relies on weighting each possible outcome (tt) by its likelihood (P(X=t)P(X=t)) and summing these products: E[X]=ttP(X=t)E[X] = \sum_{t} t \cdot P(X=t)

For beginner intuition, consider rolling a standard fair die. The possible outcomes are t{1,2,3,4,5,6}t \in \{1, 2, 3, 4, 5, 6\}, each with a probability P(X=t)=1/6P(X=t) = 1/6. E[X]=1(1/6)+2(1/6)+3(1/6)+4(1/6)+5(1/6)+6(1/6)=3.5E[X] = 1(1/6) + 2(1/6) + 3(1/6) + 4(1/6) + 5(1/6) + 6(1/6) = 3.5

Example: Expected Value of a Lottery Ticket

Q1

Lottery Valuation

Example (4.1.2): A lottery ticket can be worth £200, £20, or nothing (£0).

  • P(X=200)=1/1000P(X=200) = 1/1000
  • P(X=20)=27/1000P(X=20) = 27/1000
  • P(X=0)=972/1000P(X=0) = 972/1000

Question: What is the average value of such a ticket?

📝 View Detailed Solution

Solution: Applying the definition of expected value: E[X]=200(11000)+20(271000)+0(9721000)=0.20+0.54+0=0.74E[X] = 200\left(\frac{1}{1000}\right) + 20\left(\frac{27}{1000}\right) + 0\left(\frac{972}{1000}\right) = 0.20 + 0.54 + 0 = 0.74

Answer: The expected value of the ticket is £0.74 (or 74 pence).

Key Properties of Expected Value

The expectation of a sum of random variables is the sum of their individual expectations, regardless of whether they are independent. E[aX+bY]=aE[X]+bE[Y]\mathbf{E[aX + bY] = aE[X] + bE[Y]}

If XX and YY are independent discrete random variables, the expected value of their product is the product of their expected values. E[XY]=E[X]E[Y]\mathbf{E[XY] = E[X]E[Y]} (Note: This property generally fails if XX and YY are dependent.)

Application: Expected Values of Common Distributions

Using these rules simplifies calculating the expectations for fundamental distributions:

DistributionParameter(s)Expected Value (E[X]E[X])
Bernoulli(pp)pp \in (Success/Failure)p\mathbf{p}
Binomial(n,pn, p)nn trials, pp success probabilitynp\mathbf{np}
Geometric(pp)Trials until 1st success1/p\mathbf{1/p}

The expected number of successes in nn independent Bernoulli trials (Binomial) is simply the sum of the individual Bernoulli expected values: E[X1++Xn]=p++p=npE[X_1 + \dots + X_n] = p + \dots + p = np.


4.2 Variance and Standard Deviation: Quantifying Spread

While the expected value tells you the central point of the distribution, it does not tell you how spread out the outcomes are.

Concept and Calculation

💡

Variance & Standard Deviation

The variance (Var[X]Var[X]) is the average of the squared distances between each outcome XX and the mean E[X]E[X]. Var[X]=E[(XE[X])2]\mathbf{Var[X] = E[(X - E[X])^2]}

The standard deviation (SD[X]SD[X] or σ\sigma) is the square root of the variance. The standard deviation is typically viewed as the typical distance from the average.

Key Properties of Variance

  1. Alternate Formula: Variance is often easier to compute using the second moment (E[X2]E[X^2]): Var[X]=E[X2](E[X])2\mathbf{Var[X] = E[X^2] - (E[X])^2}
  2. Scaling: When scaling a random variable by a constant aa, the variance scales by a2a^2: Var[aX]=a2Var[X]\mathbf{Var[aX] = a^2 \cdot Var[X]}
  3. Shifting: Adding a constant aa (a location shift) does not change the spread: Var[X+a]=Var[X]\mathbf{Var[X + a] = Var[X]}
  4. Independence (Sum Rule): If XX and YY are independent random variables, their variances add: Var[X+Y]=Var[X]+Var[Y]\mathbf{Var[X + Y] = Var[X] + Var[Y]}

Example: Variance of a Die Roll

Q2

Die Roll Spread

Question: What are the variance and standard deviation of a single roll of a fair die (E[X]=3.5E[X]=3.5)?

📝 View Detailed Solution

Solution: The variance is the average squared distance from the mean 3.5: Var[X]=16((13.5)2+(23.5)2+(33.5)2+(43.5)2+(53.5)2+(63.5)2)Var[X] = \frac{1}{6}\left( (1-3.5)^2 + (2-3.5)^2 + (3-3.5)^2 + (4-3.5)^2 + (5-3.5)^2 + (6-3.5)^2 \right) Var[X]=35122.917Var[X] = \frac{35}{12} \approx 2.917

The Standard Deviation is SD[X]=35/121.71SD[X] = \sqrt{35/12} \approx \mathbf{1.71}. This confirms that a typical deviation from the mean 3.5 is about 1.71.

Variance of Common Distributions

DistributionVariance (Var[X]Var[X])
Bernoulli(pp)p(1p)\mathbf{p(1-p)}
Binomial(n,pn, p)np(1p)\mathbf{np(1-p)}
Geometric(pp)(1p)/p2\mathbf{(1-p)/p^2}

4.3 Standard Units and Chebyshev’s Inequality

When discussing variability, it is useful to express outcomes in standard units—the number of standard deviations (σ\sigma) an outcome is from the expected value (μ\mu).

Chebyshev’s Inequality

The Chebyshev’s Inequality provides a universal upper bound on the probability that a random variable XX will deviate significantly from its mean, regardless of the specific shape of the distribution.

💡

Chebyshev's Inequality

For any random variable XX with finite variance σ2\sigma^2, and for any value k>0k>0: P(Xμkσ)1k2\mathbf{P(|X - \mu| \geq k\sigma) \leq \frac{1}{k^2}}

Explanation: This inequality states that the chance of an outcome being kk or more standard deviations away from the average is at most 1/k21/k^2.

Example (4.3.5)

Q3

Chebyshev Bound

Example (4.3.5): Find an upper bound on the likelihood that XX will be more than two standard deviations from its expected value.

📝 View Detailed Solution

Solution: Set k=2k=2: P(Xμ2σ)122=14P(|X - \mu| \geq 2\sigma) \leq \frac{1}{2^2} = \frac{1}{4}

Conclusion: There is at most a 25% chance that a random variable will be more than two standard deviations from its expected value.


4.4 Conditional Expectation and Total Expectation

Conditional expected value (E[XA]E[X|A]) is the expected value of XX given that some event AA has definitely occurred. The conditioning event AA alters the underlying probabilities, thus potentially changing the average outcome.

Law of Total Expectation

This theorem relates the overall expected value E[X]E[X] to conditional expected values over a set of disjoint events {Bi}\{B_i\} that cover the entire sample space (SS).

💡

Law of Total Expectation

E[X]=iE[XBi]P(Bi)\mathbf{E[X] = \sum_{i} E[X|B_i]P(B_i)}

Example (4.4.5)

Q4

Investment Return

Example (4.4.5): A venture capitalist estimates the expected return on an investment (XX) conditional on economic outlooks AA (stronger), BB (same), or CC (weaker):

  • E[XA]=3E[X|A] = 3 million (P(A) = 0.1)
  • E[XB]=1E[X|B] = 1 million (P(B) = 0.4)
  • E[XC]=1E[X|C] = -1 million (P(C) = 0.5)

Question: What is the overall expected return (E[X]E[X])?

📝 View Detailed Solution

Solution: Using the Law of Total Expectation: E[X]=E[XA]P(A)+E[XB]P(B)+E[XC]P(C)E[X] = E[X|A]P(A) + E[X|B]P(B) + E[X|C]P(C) E[X]=3(0.1)+1(0.4)+(1)(0.5)=0.3+0.40.5=0.2E[X] = 3(0.1) + 1(0.4) + (-1)(0.5) = 0.3 + 0.4 - 0.5 = 0.2

The expected return on investment is $0.2 million (£200,000).


4.5 Covariance and Correlation: Measuring Relationships

When analyzing two random variables, XX and YY, the goal is often to determine how they relate to each other—if knowing one changes your expectation of the other.

Covariance (Cov[X,Y]Cov[X, Y])

The covariance measures the degree to which two random variables vary together. Cov[X,Y]=E[(XE[X])(YE[Y])]Cov[X,Y] = E[(X - E[X])(Y - E[Y])]

  • Positive Covariance (>0> 0): XX tends to be above average when YY is above average. They are positively correlated.
  • Negative Covariance (<0< 0): XX tends to be above average when YY is below average. They are negatively correlated.
  • Zero Covariance (=0= 0): XX and YY are uncorrelated.

Property: If XX and YY are independent, then Cov[X,Y]=0Cov[X,Y] = 0. (The converse is not necessarily true).

The calculation is often simplified using the alternate formula: Cov[X,Y]=E[XY]E[X]E[Y]Cov[X,Y] = E[XY] - E[X]E[Y]

Correlation (ρ[X,Y]\rho[X, Y])

The correlation coefficient (ρ\rho) is the dimensionless, standardized version of the covariance. It is standardized by dividing the covariance by the product of the standard deviations (σXσY\sigma_X \sigma_Y).

💡

Correlation Coefficient

ρ[X,Y]=Cov[X,Y]σXσY\rho[X,Y] = \frac{Cov[X,Y]}{\sigma_X \sigma_Y}

Range and Interpretation: The correlation coefficient is always bounded between 1\mathbf{-1} and 1\mathbf{1}.

  • +1+1: Strong positive linear relationship.
  • 1-1: Strong negative linear relationship.