Statistics – Understanding vs. Deceiving

Someone posted a Facebook link to a blog post w/pictures from Ben Orlin‘s blog, Math with Bad Drawings, entitled “Why Not to Trust Statistics”. If you click the link you will see the post is several drawings with stick figures quoting statistics, and then a mathematical representation of those statistics showing how deceiving statistics can be if you don’t in fact have all the data yourself. The point being, if you are given statistical comparisons without the data, the ‘words’ you hear or the graphs/pictures you see may in fact be incredibly deceiving and not accurate.

Here is one illustration from Orlin’s post to clarify this (see rest of his post for the remaining ones):



The warning here – don’t trust statistics without understanding where the data comes from and what the data actually represents. Not something I think is done generally – just listen to the news or read the papers/magazines.  Heck – look at our politicians and the ‘false’ or deceptive statistics (among other things) constantly quoted or visually shown in their speeches. A lot of statistics (verbal and/or visual representations) come with none of the background data or context, so imagine how much deceit – intentional and unintentional – is occurring. (Always fun to check out the ‘fact checks’ of political speeches to realize the spin put on many statistics).

A major problem here is that many people do not understand the statistics – what’s a mean? what’s a range? what’s a mode? what’s a median? Lack of understanding, lack of data, combined with a deliberate spin on the statistics either verbally and/or visually leads to confusion, misrepresentation, bad decisions, believing something to be true when it’s not, and so much more. It’s scary. And only with education can this “lack of understanding” or maybe it’s better to say “willingness to believe what we see/hear” be combated.

Ben Orlin’s illustrations made me think about how we teach statistics – usually with a group a data points or just a list of numbers with little context, which students then calculate the statistical measures and graph the results. But do we spend enough time comparing these different measures ( I am just thinking about measures of central tendency here) or really work with outliers and how these impact the measures (see example above for the ‘mean’ salary). Do we put enough context to these numbers so that the meaning of these measures truly makes sense? Do we provide context, real data, and real opportunities to look at visuals and verbal representations of statistics and make sense of them, in order to help students make informed decisions? My personal experience is no….though with new standards such as The Common Core, I think this might be changing as there is definitely more emphasis on statistics, real-world context, and interpreting and making sense of data. That’s encouraging.

Thinking about students and teaching, here are some visuals (using the Casio Prizm) and the data from Ben Orlin’s example (i.e. 8 salaries from a ‘company’: 7 at $30,000 and 1 (CEO, of course!) at $430,000.  These can really get the conversation going on what is a measure of central tendency, how can the same data reveal different numbers of be perceived differently, and how do outliers impact data and data reporting, how do visuals distort or reveal?

Here are the 1-variable statistics – as you can see the mean (in the illustration above, and the one used to give the “average” company salary of $80,000) is a distorted statistic, since the CEO’s salary (max value) is so much larger than all the other salaries. A better ‘measure’ to use would have been the median or mode, as those are more realistic to this set of data, where all but one person makes that salary.







Visually, if we look at two different versions of a box plot, one with $430,000 as an outlier, the other as part of all the data, you see some funky looking box plots (which would be a conversation all to itself…where’s the box? where’s the whiskers?).  But – in outlier mode, you can see the red outlier is significantly different than all the other data (which is all the same).

DispCap3   DispCap6

If we look at the data with a pie chart, bar graph, or histogram, we also see visually how the $430,000 is an extreme data point.  All of this leading to the question of how an average is sometimes NOT the best statistical measure if you know what the data is and how it is spread.

DispCap1  DispCap2  DispCap5

Statistics is so important and prevalent in so many areas of our society, so let’s make sure we are helping students not only know how to find these statistical measures, but more importantly, help them to question what they see and hear, make sense of the data and understand the potential discrepancies, distortions and misuse of statistics so that they are making informed decisions based on real data and not swayed by a pretty picture or a scary number that is meant to deceive or sway.