Monday, November 9, 2015

The Agony and the Ecstasy

Am I the only one who agonizes over the best way to represent a data set? Is there a 12-step program for those of us who are occasionally paralyzed by all of the visualization options? If this sounds familiar to you, read on for my latest struggle in bringing a data story to life.

Several weeks ago, I was asked by my superintendent for information on the achievement gap. For those of you who might not know this particular piece of eduspeak, it refers to the difference in achievement levels between populations of students. For example, white students often perform better on standardized tests than black students. This difference is referred to as the achievement gap.

This request should have been a piece of cake. I have the data (scores and student demographics). I'd done a similar project last year. But I felt like the bar charts I'd used were lacking. They show differences in performance plainly enough, and yet it's difficult to capture information for various populations in a single, easy to read report.

I started as I usually do, hand drawing some possible layouts and then building a few models using the data.

These examples all show longitudinal information for males and females at a particular grade level and in a certain subject area. The particulars are not too important here. What I discovered in doing these, even before cleaning up the charts, is that none of them were satisfactory. They all showed the data accurately, but none of them captured "the gap" in a way that caused any sort of interest or reaction.

Back to the drawing board.

I realized that I needed Excel to show me the percentages on one axis, like a number line, so the space...the gap...became visible. Here is what I ended up with:

This is a bit out of context, so let me tell you a bit more about what you're looking at. This chart only shows 2015 data for one grade level. The horizontal line is the number'ish line: 0 - 100%. The vertical line shows the overall percentage of students who met the standard on a particular assessment. The placement of populations (shown by orange or blue triangles) provides a general relationship to that overall level of achievement, as well as shows the gap between the populations. I do include the actual n sizes and percentages in a table below the charts.

I am not going to show you the data tables, due to FERPA issues---some of the subgroups shown above have fewer than 10 students. I need to stay within the bounds of federal privacy laws in this public space, but just know that they exist to provide detail to data users in our district.

I'm really happy with this layout, however. It gives, at a glance, an easily readable view of the achievement gap at a grade level. When looking at these over several grades, patterns begin to emerge. This is especially important for those groups where the n size is very small for a grade level. For example, having only one black student in a grade might not tell you much if they didn't meet the standard, but when you see that our small handful of black students at every grade level all fall well below their peers, it's alarming. It's also easy to cover up either the orange or the blue markers and get a quick picture of who is or is not successful.

While I still have the longitudinal view to consider, it's simple enough to build similar charts for a few years of data and then align them to provide a similar glance at trends.

I apologized to my superintendent for my tardiness in delivering the product, but I think the agonizing has given way to some ectasy over seeing things in a way that's clear and best represents the question to be answered.

I don't know that anyone, other than those of us struggling to represent data, understands why it takes so much time to build a report. Others don't see how many different charts we modeled...all of the colors we tried...or the variety of label placements (and label content) we viewed. They don't hear the conversations with have with people around the office to learn more about what is or isn't working for them in our draft visuals or how they want to interact with and use the information presented. But for those of you who are knee-deep in this process, I'm cheering you on from here.

Bonus Round
Like these charts? They're just scatter plots in Excel, with the vertical axis removed. Easy-peasy to make if you're on the hunt for something similar using your own data.