Imagine you audit an AI hiring tool and find no evidence of adverse impact. Impact ratios are above 0.80 for all demographic groups. The tool passes the four-fifths rule. Case closed, right?
Not necessarily. There's a statistical phenomenon that can hide discrimination in plain sight: Simpson's Paradox.
What Is Simpson's Paradox?
Simpson's Paradox occurs when a trend present in several groups of data reverses when the groups are combined. In hiring data, this means aggregate results can show no disparity even when significant disparity exists within individual departments or job categories.
A Real-World Example
Let's say a company uses an AI tool across two departments: Engineering and Sales. Here's the hiring data:
Engineering Department
| Group | Applicants | Hired | Selection Rate |
|---|---|---|---|
| White | 400 | 80 | 20% |
| Black | 100 | 10 | 10% |
Engineering impact ratio for Black applicants: 10% / 20% = 0.50 (FLAG)
Sales Department
| Group | Applicants | Hired | Selection Rate |
|---|---|---|---|
| White | 100 | 60 | 60% |
| Black | 400 | 200 | 50% |
Sales impact ratio for Black applicants: 50% / 60% = 0.83 (MONITOR)
Combined (Aggregate)
| Group | Applicants | Hired | Selection Rate |
|---|---|---|---|
| White | 500 | 140 | 28% |
| Black | 500 | 210 | 42% |
Aggregate impact ratio for Black applicants: 42% / 28% = 1.50 (PASS—Black applicants appear to be favored)
The Paradox Revealed
In the aggregate data, Black applicants actually have a higher selection rate than White applicants. The tool appears to favor Black candidates!
But within each department, Black applicants have a lower selection rate. In Engineering, the impact ratio is 0.50—severe adverse impact.
What happened? Simpson's Paradox. Two confounding factors combined:
- Black applicants disproportionately applied to Sales (400 of 500) while White applicants disproportionately applied to Engineering (400 of 500)
- Sales has a much higher overall selection rate than Engineering
The higher baseline selection rate in Sales "masks" the within-department disparity when data is combined.
Why This Matters for Bias Audits
An audit that only looks at aggregate data would conclude this tool has no adverse impact. But within Engineering, Black applicants are being selected at half the rate of White applicants. That's a serious problem.
This is why NYC LL144's requirement for intersectional analysis is important—but even that may not catch Simpson's Paradox if the stratification is by demographics only, not by department.
How Paritas Handles It
Every Paritas audit includes Simpson's Paradox detection:
- We stratify results by job category and department (when data is available)
- We compare aggregate impact ratios to stratified impact ratios
- We flag cases where results reverse upon stratification
- We recommend deeper investigation when paradoxical patterns appear
The aggregate numbers might look fine. The department-level numbers tell the real story.