Wednesday, April 29, 2015

Hide and Seek

I want to circle back to an article I wrote a few years ago about my favourite data visualization.

 Hierarchical Cluster Analysis by Alex J. Bowers from http://www.pareonline.net/pdf/v15n7.pdf
It shows all of the grades earned by students during their K - 12 journeys in two school districts. I love this chart because it finds a way to show all of the data in a dense, but succinct, format.

In The Visual Display of Quantitative Information, Edward Tufte states that Above all else, show the data. While the quote was applied to a different concept for visualizing data, when I look at the chart above, the quote rises to the surface of my thinking. Showing the data is no small task, and as educators, we spend a lot of time and energy not doing that. We summarize the data into neat little one letter grades or one number test scores. As teachers, we might see a set of scores...but we are the only ones to do so and we typically view them as numbers, not visual displays.Things hide in numbers and number sets.

But a recent paper shared in the Public Library of Science (PLoS) makes the case that things can be hidden in simple visuals, too.

 CC-BY Weissgerber, Milic, Winham, Garovic

The authors of the article Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm assert that the ever-popular bar chart is a summary, and therefore "full data may suggest different conclusions from the summary statistics."  (It reminds me of Anscombe's quartet.) We often claim that pie charts are used to hide data. Et tu, bar charts?

I won't claim that the scatterplots and bump charts in the article are ground-breaking, but this paragraph in particular caught my interest (emphasis mine):

The infrequent use of univariate scatterplots, boxplots, and histograms is a missed opportunity. The ability to independently evaluate the work of other scientists is a pillar of the scientific method. These figures facilitate this process by immediately conveying key information needed to understand the authors’ statistical analyses and interpretation of the data. This promotes critical thinking and discussion, enhances the readers’ understanding of the data, and makes the reader an active partner in the scientific process. In contrast, bar and line graphs are “visual tables” that transform the reader from an active participant into a passive consumer of statistical information. Without the opportunity for independent appraisal, the reader must rely on the authors’ statistical analyses and interpretation of the data.

As educators, we might not view our work as a scientific process, but we must engage with our data. I feel pulled between the notion above that we may be oversimplifying our data presentations and some of the research about how an audience likes their data presented---which is typically charts that are the most familiar. This is not the Great Divide, mind you. We can bring these two things together with some education in the area of data literacy.

Or perhaps we underestimate our audience. I've introduced cluster maps, bump charts, and box-and-whisker diagrams to various groups this year. The first two required very little explanation. Box-and-whiskers did require a bit more orientation, but I never felt like the group using them struggled with the interpretation. I do think that concept of engagement between the visualization and the reader, as posed by the article is important. It's a different way to view interaction---a key piece of a good quality visual. It's not that the visual need be physically interactive...people don't have to be able to click, sort, or filter every chart---but we need to at least cause some thinking about what is presented.

After reading the PLoS article, I'm more convinced than ever that we need to when and why we share all the data. Bar and line charts may well be the fast food version of data viz, but we can begin to add to our visual diet by finding ways to show all of the ingredients.

Bonus Round
If you view the article on PLoS, you will have access to two Excel workbooks to help you make the charts presented in the article.

I'll share some of my own attempts to "show the data" in coming posts. Visit bump charts and cluster charts to learn more.