Tuesday, December 29, 2015

Show the Data: Distributions

Last spring, we looked at a couple of ways to show the data: Cluster diagrams and bump(s) charts. The idea here is that when we summarize data and represent it in bar or line charts, we miss nuance. Instead, when possible, we should look for ways to show all of the data.

I had this in mind when I recently tried to tell a story about student performance in my school district. The district does a pretty good job with about 80% of its students, and because that's far better than the state average, no one asks the hard questions about the remaining 20%. Shouldn't they learn to read and graduate from high school, too? I certainly think so...but I often run into roadblocks when I try and raise this conversation. Maybe I need a different visual to use as persuasion.

I remembered some information for another post about recent research in using distributions to replace bar charts and thought I might give it a try.

Here is Exhibit One:

This is typically what we provide to schools (and the public) regarding student performance. This chart represents one grade level at one of our elementary schools. Levels 1 and 2 (L1, L2) represent the percent of students who did not meet the standards ("pass"), while Levels 3 and 4 show the percent of stuents who met (L3) or exceeded (L4) the standards for English Language Arts. Generally speaking, this doesn't look so bad. Lots of kiddos did well enough, and L1 is the smallest category. Yea, Team!

But who are these kids in each category? Do we have the same distribution in performance if we look at student demographics? Let's find out.

Here is Exhibit Two:

A little orientation to this beast. We still have levels of performance (1, 2, 3, 4) on the x-axis, but the y-axis shows the actual scale scores. In other words, for this grade level and subject, Level One is actually made up of scores in the range of 2114 - 2366, Level Two ranges between 2367 and 2431, Level Three is represented by 2431 - 2489, and Level Four includes performance between 2490 and 2623 (source). Every child's score (n = 69) is represented by a circle on the chart.

It might be interesting in itself to just look at the distributions. But I've added some information based on student ethnicity. The grey circles that you see represent students who are white (n = 54). The pink circles represent students of color (n = 15). Overall, only one-third of scores from students of color are in Levels 3 or 4, while about two-thirds of the white student performance are in those levels. And, one-third of all students of color are in the lowest category (Level One).

If you're wondering about why I am not representing different groups (American Indian, Asian, Black, Hispanic, Pacific Islander, Two or More Races, White) with different colors...well, I can tell you that I wrestled with that decision quite a bit. Our district has very small numbers of students of different races. For example, for the school and grade represented above, there are no black students. There is one American Indian student shown on the chart (I can't tell you where, due to FERPA restrictions). This student as an individual is important and worthy of all of our best efforts. When represented by a score, conversations become problematic because there is no way to compare it with others in the same group.  Disaggregation of the data at the grade and school levels does not cause the sorts of inquiry that it should because "it's just one score." Trust me---I've heard that refrain quite a bit. But when I add the "just one score" with others in a building who represent non-white students, there's a bigger argument to be made. Your mileage may vary, based on the populations you are working with. All that being said, I am very open to feedback on this. What are some other options I should consider that will balance tiny n-size against the overall story to be told? Stacked bars, perhaps?

I realize that that two charts I've shown in the post represent different things. One is just the overall percentage by category...while the other is distribution by category. So, one isn't necessarily a replacement for the other. Even if I altered things a bit by showing numbers of students in the first one, it would result in the same chart. But I think there is some real power in looking at the second chart---even if it was not coded---and understanding that every child is there. It's not a blob of summary performance...and goes beyond a simple count of who is in each box.

So, here's looking at you, kid. (Especially if you aren't white.)

Bonus Round
The distribution of performance chart shown above was built in Excel (of course). It is a basic scatter plot chart, with specific scores selected and colored either grey or pink. If you visit the research site I mentioned earlier (Beyond Bar and Line Charts), they have some workbooks you can download and easily modify.

Saturday, December 12, 2015

WERA 2015: Data Viz Workshop

I've done several presentations over the years about data visualization within public education. I've talked about graphic representations as a form of feedback, types of tools, guidelines for improving communication using visuals, and more. All have been brief 60 - 75 minute affairs with some very simple sorts of activities and conversations along the way.

This week, however, I had an opportunity to guide my first workshop. I had three and a half hours available to support educators in really digging into telling the best stories we can with our data. Having this sort of space and time available enabled me to think about the content differently. I've posted links to previous presentation materials here, and this post is no exception. For those of you who might be interested in scoping out the slides, materials, or links, head on over to ye olde dataviz wiki to take a look through the stash.

What I wanted most for the audience this time through was an opportunity for self-reflection and metacognition. Educators have relentless jobs. There is often no chance to think about what did or didn't work with a group of students today because they will be here again in the morning...and the tyranny of the urgent is to plan for that. I felt like it was important for our afternoon to be a time where people could become more aware of their own design process---no matter how simple or sophisticated it might be---and, more importantly, be inspired. I tried to bring in many different examples of current projects from a variety of fields. The best way to get out of your bar or line chart rut is to see where others have made departures.

I don't consider myself an expert in any of this. I do consider myself curious about it. I do see a significant need in our field to elevate our visual communications, as well as prepare our students to do the same. I want to continue these conversations and add what I can. I will refine my workshop materials and perhaps have another opportunity to engage in this work at another time. I enjoyed it and hope that it's the start of something bigger and better for our field.