Thursday, May 7, 2015

Show the Data: Cluster Charts

In the last post, we explored the idea of adding bump(s) charts to our rotation of how we communicate our data. It's one way to show all of the data in a particular set. Another one I've been using quite a bit is a cluster chart. Full disclosure here, these are my own take on displaying data---a bastardized heat map, and certainly not based on heavy-duty math like real hierarchical cluster charts. So, really, I'm not sure what to call these...but in my current role, we're finding them to be very useful and I'm just rolling with cluster charts as my category.

This spreadsheet will eat your soul.
I get a lot of spreadsheets sent to me that look like the one on the right. I hate these with a fiery passion for a variety of reasons:
  • Too much "ink" in the data-to-ink ratio. With all of those little boxes, I don't know where to look.
  • And the colors. I feel like a circus came to town. But beyond that, the red and green are not particularly friendly to those with color vision issues...and I do work with some who are color blind. Are we really asking them to try and make decisions on student learning  based on this?
  • Not to mention that all of the data is colored in. What's the point?
  • And we have both numbers and colors. I'm not saying that you can't have both...or that they don't serve different purposes...but it's distracting. I'm constantly trying to make sense of the number patterns for each color.
I also think these data aren't useful because of the way they are organized. Alphabetical order is great for gradebooks, but not so much for trying to make sense of the data. Plus, we don't have any context---what if we're missing some signal in the noise? Suppose all of our low-performing students are boys...or in a minority group?

But let's say you are interested in showing both the progress students have made over time, as well as the characteristics of the students involved. We can reorganize the data by ranking the percentages on the second assessment (this is the "cluster" part). Then we can color code some additional information, such as gender or participation in a particular program. I also change the properties of the conditional formatting so that the fill and text are the same color, making the values seem invisible. Finally, I add thick white borders around all of the cells and resize the rows and columns. Here is a small part of the final product:

These are all the students who scored in the top level of our fictional grade 5 winter math assessment. Three of them improved a little, from light blue to a darker blue...others improved from lower on the scale (orange). But when we look at gender and program, another story emerges. Most of the students in the top category are female, not on free or reduced lunch, not in special education, and do not receive additional interventions through a Title I program.

See the difference when we look at students who have scored at the bottom for both fall and winter? Our population is mostly male and nearly everyone participates in one or more federal programs.

Maybe this representation doesn't necessarily hold any surprises, especially as we factor in free/reduced lunch. Children living in poverty typically do not perform as well as their peers. But one of the things I take away from this way to visualize that story is that we may need different interventions to support these students. Consider Student 39 on the right. He is receiving free or reduced lunch, special education and Title I services...and he's still ranked fourth from the bottom out of nearly 70 students. It doesn't mean that the school (and the student) aren't working as hard as they can. I do think it might mean that there are additional factors at work here that aren't (and can't be) addressed through the school. Perhaps the family is homeless or transient. Maybe the parents are going through a divorce...or the student has some medical issues. These are community-based issues and require different interventions to help close the gap for the student. I won't get up on my left coast soapbox about this right now. I'll just say that we have to work together on behalf of the whole child.

One of the pieces of information that is not represented in the visuals above is the number one item on teacher wishlists when it comes to reporting scores: progress. Sure, we have a bunch of students performing at the lowest level in the picture above, but that doesn't mean that they didn't make some growth.

This time around, I left the gender and program pieces coded the same, but I calculated the percent change between fall and spring and represented those in the leftmost column.

Look at Student 39 now. He's 12th from the top. Woo-hoo!

When we consider progress, we start to get a more equitable pattern---everyone is growing, and more often than not, it's our lowest performers who are making the biggest gains, even if they're still in the lowest part of the score breakdown.

By clustering similarly performing students together, either by scores or by progress, we get a much more useful pattern than we do with a spreadsheet that looks like a clown exploded on it. And, more importantly, we can show the data. In a very compact space, I can display everyone's scores and whatever demographic or program information is most relevant. And, I can fit the whole grade level on a single page.

I have no doubt that as we move forward, smarter people than me (I?) will continue to find new charts that help share everything we know about a group. Summary stats and charts will never go away---and they have their own purpose to serve. But sometimes we want the full version, not the Cliff's Notes. When we do, bump and cluster charts will be there.

Bonus Round
Want to see just how challenging colored squares can be? Play this online game. The rules are easy: just click on the one square that is different.

Monday, May 4, 2015

Show the Data: Bump Charts

This is a bump(s) chart, a/k/a slope graph. If you poke around online, you'll find a variety of examples and names. Some have multiple data points for each line...and some are simpler, like the one on the right.

Typically, the lines are labeled on each end, often with then name of the data series and sometimes with the data value. I do have a version of this one with the lines labeled, but since these represent real data points, it's best to keep things anonymous for this example.

But let me give you a little context here for what I'm showing. Each line represents a teacher---the entire chart shows the entire staff for a school. On the left is each teacher's percent of Ds and Fs assigned to students for the first semester of the 2013 - 14 school year. On the right is the value for the 2014 - 15 school year.

We want to use this chart to look at two things. First of all, what is the general trend within the school? In this case, most of the lines are sloping downward. This may connect to initiatives, such as changes in grading practices, tutorial options, or improvements in instruction. Whatever the story is behind this chart, it's looking positive.

Next, we want to consider the steepness of the slopes we observe. Sure, we could add a trendline, but if you're just using the chart for exploratory purposes, we can eyeball things. In this case, we might note that most of the downward sloping lines, especially for the upper percentages on the 2014 side of the house, have had significant decreases.

Typically, when I present these charts, I include a summary of the data. For example, between the 2014 and 2015 school years, 30 teachers assigned fewer Ds and Fs to students, 7 teachers had very little change in the percentage of Ds and Fs assigned to students, and 3 teachers showed an increase in the percentage of Ds and Fs assigned to students. Because these charts are new to many of the people in my audience, this brief summary is enough to get them oriented to the chart. They can then begin to focus on the details. This might start with the slope of the lines, but then I see them begin to dig into the labels: Are some teachers in new-to-them assignments this year? Are the lines showing little change all in one content area, such as math? What might we see next year---is there a goal around our percentages?

And now, a musical interlude...

Let's take a look at another of these beasts. This is a different school, but in the same district.

We see a lot of increases compared to the other school; but, if you look at the scale on the lefthand side, you'll see that none have a higher percentage than the other school. Generally speaking, teachers in this school assign a lower percentage of Ds and Fs vs. the other school.

The overall changes at this location aren't as dramatic, either. The slopes are more gentle.

What might account for the differences? Again, you'd have to poke further using knowledge specific to the school: Is this a more veteran staff that is has more expertise or are more resistant to change? Are the increases due to an unexpected change in student population---were the enrollment boundaries changed?

You could, with additional information, make some other comparisons between the two schools. What if you built graphs just showing one department, such as math? It would make the charts less busy and comparisons between buildings a lot easier.

The big idea with these charts, of course, is to show the data. Sure, we could just write the summary and do a simple bar chart or line chart to compare totals...but we're missing a lot of the story in doing so. When we go bumpin', we get a much richer picture of what is happening.

Next time, we'll take a look at another way to show the data using cluster charts.

Bonus Round
These charts are super-simple to make in Excel. Jon Peltier has an excellent tutorial on his web site. You can also download a template from the article I profiled in the last post.

Music credit: "Bumpin Bumpin" by Kreayshawn (c) 2011.