Written in Livemark
(2022-06-22 04:52)

Data fallacies

"Statistics never lie, but lovers often do..." (J. Tinga, Antonio vs. Reyes, 484 SCRA 353 (2006))

With all due respect to Justice Tinga but statistics and data do lie and they do it quite often.

Cherry picking

By selecting or cherry-picking data, the trend of global warming appears to mistakenly stop, as in the period from 1998 to 2012, which is actually a random contrary fluctuation.

What it is

How to avoid it

Data dredging

An example of data produced by data dredging through a bot operated by Tyler Vigen, apparently showing a close link between the best word in a spelling bee competition and the number of people in the US killed by venomous spiders. It's obviously a coincidence: with so many possible comparisons of data of things happening in the world, it is easy to find some unrelated data that shows similar trends.

What it is

How to avoid it

Survivorship bias

This hypothetical pattern of damage of returning aircraft shows locations where they can sustain damage and still return home. If the aircraft was reinforced in the most commonly hit areas, this would be a result of survivorship bias because crucial data from fatally damaged planes was being ignored; those hit in other places presumably did not survive.

What it is

How to avoid it

Sampling bias

Sampling Bias, via Geckoboard

What it is

How to avoid it

Cobra effect

The story goes something like this: back in colonial India the top Brit in charge decided there were too many cobras around Delhi. To reduce the population they put in place a cash reward, or bounty, for anyone who brought in a dead cobra. The intention was clear. Legend has it that people did bring in the cobras reliably because some enterprising souls had started breeding cobras for the very purpose of getting the bounty.

What it is

How to avoid it

Gerrymandering / MAUP

Different ways to apportion electoral districts leads to different election results

What it is

How to avoid it

False causality

False Causality, via Geckoboard

What it is

How to avoid it

Danger of summary metrics

Four different datasets look identical when examined using simple summary statistics, but vary considerably when graphed.

What it is

How to avoid it

Other data fallacies

Learn about open data, how to work with data, how to do better data-driven projects, and how to improve your data literacy.