Visual Anchors: Real World Analysis
Real World Analysis
In the final chapter of our Visual Anchors series, we tackle the tools used by Data Scientists every day to make decisions.
1. Salary Distributions (Box Plot)
Descriptive statistics often mislead. “Average salary” can be skewed by one billionaire. A Box Plot (or Box-and-Whisker) reveals the true spread: the Median, the Interquartile Range (middle 50%), and the outliers.
Scenario: Tech Startup Salaries (in $1000s)
Insight: While the “max” salaries are similar, the Median engineer earns remarkably more (90k). Also, notice the outlier in Engineering (the dot at $250k) — that’s the CTO!
2. Joint Probability (Heatmap)
In Chapter 3, we discuss Joint Distributions (). This is effectively a 2D grid of probabilities. A Heatmap allows us to spot the “hot zones” of correlation instantly.
Scenario: Survey of Transport vs Commute Time.
- rows: Car, Bus, Train
- cols: <15m, 15-30m, >30m
Analysis: The dark blue cell (0.25) shows that the most common outcome is Bus + 15-30m. The empty cell (0.0) shows nobody takes a Train for <15m.
3. A/B Testing Significance (Z-Test)
When a business runs an A/B test, they calculate a Z-Score to see if the new design is truly better. If the score falls into the “critical region” (the tail), the result is Statistically Significant.
Scenario:
- Null Hypothesis: No difference ().
- Observation: We observed a difference of standard errors ().
Conclusion: The shading represents the P-value (probability of seeing this result by luck). Since the area is tiny (and ), we Reject the Null Hypothesis. The new design worked!