Menu
Advertisement
Advertisement
Advertisement

Association Between Two Variables

ASSOCIATION BETWEEN TWO VARIABLES

Chapter 5 of the sources is titled “Association between two variables” and explores how information about one variable can provide insights into another. The chapter is divided into three main sections based on the types of variables being compared: categorical-categorical , numerical-numerical , and categorical-numerical .


1. Association Between Two Categorical Variables

To determine if an association exists between two categorical variables, researchers use a contingency table.

💡

Criteria for Association

  • Not Associated: The row (or column) relative frequencies are the same for all rows (or columns).
  • Associated: If these frequencies differ significantly across rows or columns.

Example: Smartphone Ownership

Female Ownership: 77.27% | Male Ownership: 75%


Since these values are nearly identical, Gender and Ownership are not associated.

77.3 Female (%) 75 Male (%)

2. Association Between Two Numerical Variables

The sources define several tools to examine the relationship between two quantitative variables.

  1. Scatter Plots: A visual test where pairs of values are displayed as points on a two-dimensional plane.
  2. Describing Patterns: Check for Direction (up/down), Curvature (linear/curve), Variation (clustering), and Outliers.
  3. Covariance: Quantifies linear association strength but units are difficult to interpret.
  4. Pearson Correlation Coefficient (rr): Unitless measure between -1 and +1.
  5. Fitting a Line and R2R^2: Goodness of fit; closer to 1 signifies a good fit.
💡

Example: Car Age vs Price

A study of car ages and prices showed that as age increases, price decreases.

  • Trend: Negative Linear Association (r1r \approx -1).
Age (Years) Price ($) (1, 20000) (2, 18000) (3, 16000) (5, 12000) (8, 6000)

3. Association Between Categorical and Numerical Variables

When one variable is numerical and the other is categorical with exactly two categories (dichotomous), the Point Bi-serial Correlation Coefficient (rpbr_{pb}) is used.

💡

Procedure

You group the numerical data based on the two categories (often coded as 0 and 1) and compare their means relative to the overall population standard deviation.


Practice Questions

Q1

Categorical Association

In a study of 10,000 students, 80% of males passed an exam and 79.9% of females passed. Is there an association between gender and passing?

View Detailed Solution

No. Because the row relative frequencies (80% and 79.9%) are effectively the same for both rows, the variables are not associated.

Q2

Covariance Units

If the covariance between weight (kg) and height (m) is calculated, what are its units?

View Detailed Solution

The units are kg ×\times m.

Q3

Numerical Trend

A scatter plot shows that as the size of a house increases, the price also increases in a straight line. How would you describe this association?

View Detailed Solution

The direction is upward (positive) and the curvature is linear.

Q4

Correlation Fit

A dataset has a Pearson correlation coefficient (rr) of -0.95. What does this tell you about the fit and the relationship?

View Detailed Solution

It indicates a strong negative linear association. The R2R^2 value would be (0.95)2=0.9025(-0.95)^2 = 0.9025, meaning the line is a good fit, capturing about 90% of the variance.


💡

The Dance Partners Analogy

Think of association as two people dancing together.

  • Categorical association: Checking if people in red shirts always choose partners in blue shirts.
  • Numerical association: Watching how their steps move; if one person takes a step forward and the other consistently takes a step forward too, they have a positive correlation.
  • The Correlation Coefficient (rr) is the “synchronisation score”—a +1 means they are perfectly in sync, while a 0 means they are stepping on each other’s toes.
Sponsored Content

finding (solutions) x

A public notebook and learning hub.