## Tuesday, December 29, 2015

### Show the Data: Distributions

Last spring, we looked at a couple of ways to show the data: Cluster diagrams and bump(s) charts. The idea here is that when we summarize data and represent it in bar or line charts, we miss nuance. Instead, when possible, we should look for ways to show all of the data.

I had this in mind when I recently tried to tell a story about student performance in my school district. The district does a pretty good job with about 80% of its students, and because that's far better than the state average, no one asks the hard questions about the remaining 20%. Shouldn't they learn to read and graduate from high school, too? I certainly think so...but I often run into roadblocks when I try and raise this conversation. Maybe I need a different visual to use as persuasion.

I remembered some information for another post about recent research in using distributions to replace bar charts and thought I might give it a try.

Here is Exhibit One:

This is typically what we provide to schools (and the public) regarding student performance. This chart represents one grade level at one of our elementary schools. Levels 1 and 2 (L1, L2) represent the percent of students who did not meet the standards ("pass"), while Levels 3 and 4 show the percent of stuents who met (L3) or exceeded (L4) the standards for English Language Arts. Generally speaking, this doesn't look so bad. Lots of kiddos did well enough, and L1 is the smallest category. Yea, Team!

But who are these kids in each category? Do we have the same distribution in performance if we look at student demographics? Let's find out.

Here is Exhibit Two:

A little orientation to this beast. We still have levels of performance (1, 2, 3, 4) on the x-axis, but the y-axis shows the actual scale scores. In other words, for this grade level and subject, Level One is actually made up of scores in the range of 2114 - 2366, Level Two ranges between 2367 and 2431, Level Three is represented by 2431 - 2489, and Level Four includes performance between 2490 and 2623 (source). Every child's score (n = 69) is represented by a circle on the chart.

It might be interesting in itself to just look at the distributions. But I've added some information based on student ethnicity. The grey circles that you see represent students who are white (n = 54). The pink circles represent students of color (n = 15). Overall, only one-third of scores from students of color are in Levels 3 or 4, while about two-thirds of the white student performance are in those levels. And, one-third of all students of color are in the lowest category (Level One).

If you're wondering about why I am not representing different groups (American Indian, Asian, Black, Hispanic, Pacific Islander, Two or More Races, White) with different colors...well, I can tell you that I wrestled with that decision quite a bit. Our district has very small numbers of students of different races. For example, for the school and grade represented above, there are no black students. There is one American Indian student shown on the chart (I can't tell you where, due to FERPA restrictions). This student as an individual is important and worthy of all of our best efforts. When represented by a score, conversations become problematic because there is no way to compare it with others in the same group.  Disaggregation of the data at the grade and school levels does not cause the sorts of inquiry that it should because "it's just one score." Trust me---I've heard that refrain quite a bit. But when I add the "just one score" with others in a building who represent non-white students, there's a bigger argument to be made. Your mileage may vary, based on the populations you are working with. All that being said, I am very open to feedback on this. What are some other options I should consider that will balance tiny n-size against the overall story to be told? Stacked bars, perhaps?

I realize that that two charts I've shown in the post represent different things. One is just the overall percentage by category...while the other is distribution by category. So, one isn't necessarily a replacement for the other. Even if I altered things a bit by showing numbers of students in the first one, it would result in the same chart. But I think there is some real power in looking at the second chart---even if it was not coded---and understanding that every child is there. It's not a blob of summary performance...and goes beyond a simple count of who is in each box.

So, here's looking at you, kid. (Especially if you aren't white.)

Bonus Round
The distribution of performance chart shown above was built in Excel (of course). It is a basic scatter plot chart, with specific scores selected and colored either grey or pink. If you visit the research site I mentioned earlier (Beyond Bar and Line Charts), they have some workbooks you can download and easily modify.