Title Naked Statistics: Stripping the Dread from the Data English 2.0 MB 273
Cover
Title Page
Dedication
Introduction: Why I hated calculus but love statistics
1 What’s the Point?
2 Descriptive Statistics: Who was the best baseball player of all time?
Appendix to Chapter 2
3 Deceptive Description: “He’s got a great personality!” and other true but grossly misleading statements
4 Correlation: How does Netflix know what movies I like?
Appendix to Chapter 4
5 Basic Probability: Don’t buy the extended warranty on your \$99 printer
5½ The Monty Hall Problem
6 Problems with Probability: How overconfident math geeks nearly destroyed the global financial system
7 The Importance of Data: “Garbage in, garbage out”
8 The Central Limit Theorem: The Lebron James of statistics
9 Inference: Why my statistics professor thought I might have cheated
Appendix to Chapter 9
10 Polling: How we know that 64 percent of Americans support the death penalty (with a sampling error ± 3 percent)
Appendix to Chapter 10
11 Regression Analysis: The miracle elixir
Appendix to Chapter 11
12 Common Regression Mistakes: The mandatory warning label
13 Program Evaluation: Will going to Harvard change your life?
Conclusion: Five questions that statistics can help answer
Appendix: Statistical software
Notes
Acknowledgments
Index
Also by Charles Wheelan

##### Document Text Contents
Page 2

naked statistics
Stripping the Dread from the Data

CHARLES WHEELAN

Page 136

malaria drug).

In a courtroom, the threshold for rejecting the presumption of innocence is the
qualitative assessment that the defendant is “guilty beyond a reasonable doubt.”
The judge or jury is left to define what exactly that means. Statistics harnesses
the same basic idea, but “guilty beyond a reasonable doubt” is defined
how likely is it that we would observe this pattern of data by chance? To use a
familiar example, medical researchers might ask, If this experimental drug has
no effect on heart disease (our null hypothesis), how likely is it that 91 out of
100 patients getting the drug would show improvement compared with only 49
out of 100 patients getting a placebo? If the data suggest that the null hypothesis
is extremely unlikely—as in this medical example—then we must reject it and
accept the alternative hypothesis (that the drug is effective in treating heart
disease).
In that vein, let us revisit the Atlanta standardized cheating scandal alluded to

at several points in the book. The Atlanta test score results were first flagged
because of a high number of “wrong-to-right” erasures. Obviously students
taking standardized exams erase answers all the time. And some groups of
students may be particularly lucky in their changes, without any cheating
necessarily being involved. For that reason, the null hypothesis is that the
standardized test scores for any particular school district are legitimate and that
any irregular patterns of erasures are merely a product of chance. We certainly
do not want to be punishing students or administrators because an unusually high
proportion of students happened to make sensible changes to their answer sheets
in the final minutes of an important state exam.
But “unusually high” does not begin to describe what was happening in

Atlanta. Some classrooms had answer sheets on which the number of wrong-to-
right erasures were twenty to fifty standard deviations above the state norm. (To
put this in perspective, remember that most observations in a distribution
typically fall within two standard deviations of the mean.) So how likely was it
that Atlanta students happened to erase massive numbers of wrong answers and
replace them with correct answers just as a matter of chance? The official who
analyzed the data described the probability of the Atlanta pattern occurring
without cheating as roughly equal to the chance of having 70,000 people show
up for a football game at the Georgia Dome who all happen to be over seven feet
tall.2 Could it happen? Yes. Is it likely? Not so much.
Georgia officials still could not convict anybody of wrongdoing, just as my

professor could not (and should not) have had me thrown out of school because

Page 137

my final exam grade in statistics was out of sync with my midterm grade.
Atlanta officials could not prove that cheating was going on. They could,
however, reject the null hypothesis that the results were legitimate. And they
could do so with a “high degree of confidence,” meaning that the observed
pattern was nearly impossible among normal test takers. They therefore
explicitly accepted the alternative hypothesis, which is that something fishy was
going on. (I suspect they used more official-sounding language.) Subsequent
investigation did in fact uncover the “smoking erasers.” There were reports of
copy from high-scoring children, and even pointing to answers while standing
over students’ desks. The most egregious cheating involved a group of teachers
who held a weekend pizza party during which they went through exam sheets
In the Atlanta example, we could reject the null hypothesis of “no cheating”

because the pattern of test results was so wildly improbable in the absence of
foul play. But how implausible does the null hypothesis have to be before we can
reject it and invite some alternative explanation?
One of the most common thresholds that researchers use for rejecting a null

hypothesis is 5 percent, which is often written in decimal form: .05. This
probability is known as a significance level, and it represents the upper bound
for the likelihood of observing some pattern of data if the null hypothesis were
true. Stick with me for a moment, because it’s not really that complicated.
Let’s think about a significance level of .05. We can reject a null hypothesis at

the .05 level if there is less than a 5 percent chance of getting an outcome at least
as extreme as what we’ve observed if the null hypothesis were true. A simple
example can make this much clearer. I hate to do this to you, but assume once
again that you’ve been put on missing-bus duty (in part because of your valiant
efforts in the last chapter). Only now you are working full-time for the
researchers at the Changing Lives study, and they have given you some excellent
data to help inform your work. Each bus operated by the organizers of the study
has roughly 60 passengers, so we can treat the passengers on any bus as a
random sample drawn from the entire Changing Lives population. You are
awakened early one morning by the news that a bus in the Boston area has been
hijacked by a pro-obesity terrorist group.* Your job is to drop from a helicopter
onto the roof of the moving bus, sneak inside through the emergency exit, and
then stealthily determine whether the passengers are Changing Lives
participants, solely on the basis of their weights. (Seriously, this is no more
implausible than most action-adventure plots, and it’s a lot more educational.)
As the helicopter takes off from the commando base, you are given a machine

Page 272

Printed in the United States of America

First Edition

For information about permission to reproduce selections from this book, write to Permissions, W. W.
Norton & Company, Inc.,

500 Fifth Avenue, New York, NY 10110

For information about special discounts for bulk purchases,
[email protected] or 800-233-4830

Manufacturing by Courier Westford
Production manager: Anna Oler

ISBN 978-0-393-07195-5 (hardcover)
eISBN 978-0-393-08982-0

W. W. Norton & Company, Inc.
500 Fifth Avenue, New York, N.Y. 10110

www.wwnorton.com

W. W. Norton & Company Ltd.
Castle House, 75/76 Wells Street, London W1T 3QT

Page 273

Also by Charles Wheelan
10½ Things No Commencement Speaker Has Ever Said
Naked Economics: Undressing the Dismal Science