Sunday, August 16, 2015

Anatomy of a Design Build

Like many school districts, we have new data this fall. The state has changed to Smarter Balanced assessments to measure student knowledge and skills with Common Core State Standards. One of the challenges associated with presenting these data is the inevitable effort to compare them with scores from previous years. But they do not represent the same things. And so began the challenge to develop this report:

Subject areas are organized across the top. Grade levels are on the horizontal. The blue squares represent grades and subjects assessed in 2015. The large numbers in the center show the percent of students meeting the standard, with the distribution of scores shown by the column charts at the bottom of the square. Historical data are shown by the line graphs in the grey squares.

It is built in Excel (of course). On another sheet is a table listing the various schools, grade levels assessed, subjects, school years, and scores. A pivot table with this information feeds the report you see above.

I'm mostly happy with this layout and format. I would still like to tweak some of the colours, but overall, I think I've solved most of the issues with representing the data. But it took awhile to get this far.

In the beginning, there was a suggestion from one of my supervisors to offer something like what you see on the right. This is what another district in the area provides to schools and the public. I showed it to one of our principals and he said it made his eyes bleed. I agreed with that sentiment. We can do better, I said.

It's not that the information provided here is bad, it's that a pdf of a spreadsheet does not take advantage of what data visualization can offer to make meaning. It has much of the same information I developed for my version. We just used two different approaches.

Originally, I started with something similar. I pulled  demographic and program data and added some sparklines and arrows to show change.

But I decided that this was a stupid idea before I got too far down the road with it. It was way too hard to scale the graphs along the same axes. This is an important thing to do to enable comparisons. But there is just too large of a range to represent in a small space. Not to mention this layout is boring. Seriously. Yes, I know it's half finished, but no bells and whistles involving fonts and headers is going to make this version anything other than a snoozefest.

So, I tried something else.

This is where I started to play with the idea of tiles. I wanted a cool color (as opposed to a warm one) and decided to look at purple as an option. This version is slightly better, but not by a lot. Looking at it, I realized something very important: Almost none of this information is useful. Does anyone, especially a school leader, really care about the year-by-year percentages of Asian students (for example)? It's not actionable data. You're not going to go out and recruit more Asian children if you notice your percentage slipping. There's no action plan a school would make to decrease the enrollment of girls. This is not to say that viewing overall change wouldn't yield some insights, but the year-by-year review isn't very helpful. So I started over. Again.

Third time's the charm, right?

I won't claim this is perfect, but we've come a long way, baby. All that demographic and program data? Now in one tidy bar chart on the left (under "% of overall enrollment"). The five-year change is shown on the far right. The stuff in the middle? All new. Kids are more than the sum of their test scores. So, I've included some additional demographic information about absences and discipline. Now we have some conversation starters.

I did the achievement data next. This is the page at the beginning of this post. I played around a bit with colors and format, but the tiles have been a constant.

Feedback has been mostly positive, but I'm still tweaking things. What would you want to see? How should we show it? Anything we should remove or represent differently?

Thursday, May 7, 2015

Show the Data: Cluster Charts

In the last post, we explored the idea of adding bump(s) charts to our rotation of how we communicate our data. It's one way to show all of the data in a particular set. Another one I've been using quite a bit is a cluster chart. Full disclosure here, these are my own take on displaying data---a bastardized heat map, and certainly not based on heavy-duty math like real hierarchical cluster charts. So, really, I'm not sure what to call these...but in my current role, we're finding them to be very useful and I'm just rolling with cluster charts as my category.

This spreadsheet will eat your soul.
I get a lot of spreadsheets sent to me that look like the one on the right. I hate these with a fiery passion for a variety of reasons:
  • Too much "ink" in the data-to-ink ratio. With all of those little boxes, I don't know where to look.
  • And the colors. I feel like a circus came to town. But beyond that, the red and green are not particularly friendly to those with color vision issues...and I do work with some who are color blind. Are we really asking them to try and make decisions on student learning  based on this?
  • Not to mention that all of the data is colored in. What's the point?
  • And we have both numbers and colors. I'm not saying that you can't have both...or that they don't serve different purposes...but it's distracting. I'm constantly trying to make sense of the number patterns for each color.
I also think these data aren't useful because of the way they are organized. Alphabetical order is great for gradebooks, but not so much for trying to make sense of the data. Plus, we don't have any context---what if we're missing some signal in the noise? Suppose all of our low-performing students are boys...or in a minority group?

But let's say you are interested in showing both the progress students have made over time, as well as the characteristics of the students involved. We can reorganize the data by ranking the percentages on the second assessment (this is the "cluster" part). Then we can color code some additional information, such as gender or participation in a particular program. I also change the properties of the conditional formatting so that the fill and text are the same color, making the values seem invisible. Finally, I add thick white borders around all of the cells and resize the rows and columns. Here is a small part of the final product:

These are all the students who scored in the top level of our fictional grade 5 winter math assessment. Three of them improved a little, from light blue to a darker blue...others improved from lower on the scale (orange). But when we look at gender and program, another story emerges. Most of the students in the top category are female, not on free or reduced lunch, not in special education, and do not receive additional interventions through a Title I program.

See the difference when we look at students who have scored at the bottom for both fall and winter? Our population is mostly male and nearly everyone participates in one or more federal programs.

Maybe this representation doesn't necessarily hold any surprises, especially as we factor in free/reduced lunch. Children living in poverty typically do not perform as well as their peers. But one of the things I take away from this way to visualize that story is that we may need different interventions to support these students. Consider Student 39 on the right. He is receiving free or reduced lunch, special education and Title I services...and he's still ranked fourth from the bottom out of nearly 70 students. It doesn't mean that the school (and the student) aren't working as hard as they can. I do think it might mean that there are additional factors at work here that aren't (and can't be) addressed through the school. Perhaps the family is homeless or transient. Maybe the parents are going through a divorce...or the student has some medical issues. These are community-based issues and require different interventions to help close the gap for the student. I won't get up on my left coast soapbox about this right now. I'll just say that we have to work together on behalf of the whole child.

One of the pieces of information that is not represented in the visuals above is the number one item on teacher wishlists when it comes to reporting scores: progress. Sure, we have a bunch of students performing at the lowest level in the picture above, but that doesn't mean that they didn't make some growth.

This time around, I left the gender and program pieces coded the same, but I calculated the percent change between fall and spring and represented those in the leftmost column.

Look at Student 39 now. He's 12th from the top. Woo-hoo!

When we consider progress, we start to get a more equitable pattern---everyone is growing, and more often than not, it's our lowest performers who are making the biggest gains, even if they're still in the lowest part of the score breakdown.

By clustering similarly performing students together, either by scores or by progress, we get a much more useful pattern than we do with a spreadsheet that looks like a clown exploded on it. And, more importantly, we can show the data. In a very compact space, I can display everyone's scores and whatever demographic or program information is most relevant. And, I can fit the whole grade level on a single page.

I have no doubt that as we move forward, smarter people than me (I?) will continue to find new charts that help share everything we know about a group. Summary stats and charts will never go away---and they have their own purpose to serve. But sometimes we want the full version, not the Cliff's Notes. When we do, bump and cluster charts will be there.

Bonus Round
Want to see just how challenging colored squares can be? Play this online game. The rules are easy: just click on the one square that is different.

Monday, May 4, 2015

Show the Data: Bump Charts

This is a bump(s) chart, a/k/a slope graph. If you poke around online, you'll find a variety of examples and names. Some have multiple data points for each line...and some are simpler, like the one on the right.

Typically, the lines are labeled on each end, often with then name of the data series and sometimes with the data value. I do have a version of this one with the lines labeled, but since these represent real data points, it's best to keep things anonymous for this example.

But let me give you a little context here for what I'm showing. Each line represents a teacher---the entire chart shows the entire staff for a school. On the left is each teacher's percent of Ds and Fs assigned to students for the first semester of the 2013 - 14 school year. On the right is the value for the 2014 - 15 school year.

We want to use this chart to look at two things. First of all, what is the general trend within the school? In this case, most of the lines are sloping downward. This may connect to initiatives, such as changes in grading practices, tutorial options, or improvements in instruction. Whatever the story is behind this chart, it's looking positive.

Next, we want to consider the steepness of the slopes we observe. Sure, we could add a trendline, but if you're just using the chart for exploratory purposes, we can eyeball things. In this case, we might note that most of the downward sloping lines, especially for the upper percentages on the 2014 side of the house, have had significant decreases.

Typically, when I present these charts, I include a summary of the data. For example, between the 2014 and 2015 school years, 30 teachers assigned fewer Ds and Fs to students, 7 teachers had very little change in the percentage of Ds and Fs assigned to students, and 3 teachers showed an increase in the percentage of Ds and Fs assigned to students. Because these charts are new to many of the people in my audience, this brief summary is enough to get them oriented to the chart. They can then begin to focus on the details. This might start with the slope of the lines, but then I see them begin to dig into the labels: Are some teachers in new-to-them assignments this year? Are the lines showing little change all in one content area, such as math? What might we see next year---is there a goal around our percentages?

And now, a musical interlude...

Let's take a look at another of these beasts. This is a different school, but in the same district.

We see a lot of increases compared to the other school; but, if you look at the scale on the lefthand side, you'll see that none have a higher percentage than the other school. Generally speaking, teachers in this school assign a lower percentage of Ds and Fs vs. the other school.

The overall changes at this location aren't as dramatic, either. The slopes are more gentle.

What might account for the differences? Again, you'd have to poke further using knowledge specific to the school: Is this a more veteran staff that is has more expertise or are more resistant to change? Are the increases due to an unexpected change in student population---were the enrollment boundaries changed?

You could, with additional information, make some other comparisons between the two schools. What if you built graphs just showing one department, such as math? It would make the charts less busy and comparisons between buildings a lot easier.

The big idea with these charts, of course, is to show the data. Sure, we could just write the summary and do a simple bar chart or line chart to compare totals...but we're missing a lot of the story in doing so. When we go bumpin', we get a much richer picture of what is happening.

Next time, we'll take a look at another way to show the data using cluster charts.

Bonus Round
These charts are super-simple to make in Excel. Jon Peltier has an excellent tutorial on his web site. You can also download a template from the article I profiled in the last post.

Music credit: "Bumpin Bumpin" by Kreayshawn (c) 2011.

Wednesday, April 29, 2015

Hide and Seek

I want to circle back to an article I wrote a few years ago about my favourite data visualization.

Hierarchical Cluster Analysis by Alex J. Bowers from
It shows all of the grades earned by students during their K - 12 journeys in two school districts. I love this chart because it finds a way to show all of the data in a dense, but succinct, format.

In The Visual Display of Quantitative Information, Edward Tufte states that Above all else, show the data. While the quote was applied to a different concept for visualizing data, when I look at the chart above, the quote rises to the surface of my thinking. Showing the data is no small task, and as educators, we spend a lot of time and energy not doing that. We summarize the data into neat little one letter grades or one number test scores. As teachers, we might see a set of scores...but we are the only ones to do so and we typically view them as numbers, not visual displays.Things hide in numbers and number sets.

But a recent paper shared in the Public Library of Science (PLoS) makes the case that things can be hidden in simple visuals, too.

CC-BY Weissgerber, Milic, Winham, Garovic

The authors of the article Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm assert that the ever-popular bar chart is a summary, and therefore "full data may suggest different conclusions from the summary statistics."  (It reminds me of Anscombe's quartet.) We often claim that pie charts are used to hide data. Et tu, bar charts?

I won't claim that the scatterplots and bump charts in the article are ground-breaking, but this paragraph in particular caught my interest (emphasis mine):

The infrequent use of univariate scatterplots, boxplots, and histograms is a missed opportunity. The ability to independently evaluate the work of other scientists is a pillar of the scientific method. These figures facilitate this process by immediately conveying key information needed to understand the authors’ statistical analyses and interpretation of the data. This promotes critical thinking and discussion, enhances the readers’ understanding of the data, and makes the reader an active partner in the scientific process. In contrast, bar and line graphs are “visual tables” that transform the reader from an active participant into a passive consumer of statistical information. Without the opportunity for independent appraisal, the reader must rely on the authors’ statistical analyses and interpretation of the data.

As educators, we might not view our work as a scientific process, but we must engage with our data. I feel pulled between the notion above that we may be oversimplifying our data presentations and some of the research about how an audience likes their data presented---which is typically charts that are the most familiar. This is not the Great Divide, mind you. We can bring these two things together with some education in the area of data literacy.

Or perhaps we underestimate our audience. I've introduced cluster maps, bump charts, and box-and-whisker diagrams to various groups this year. The first two required very little explanation. Box-and-whiskers did require a bit more orientation, but I never felt like the group using them struggled with the interpretation. I do think that concept of engagement between the visualization and the reader, as posed by the article is important. It's a different way to view interaction---a key piece of a good quality visual. It's not that the visual need be physically interactive...people don't have to be able to click, sort, or filter every chart---but we need to at least cause some thinking about what is presented.

After reading the PLoS article, I'm more convinced than ever that we need to when and why we share all the data. Bar and line charts may well be the fast food version of data viz, but we can begin to add to our visual diet by finding ways to show all of the ingredients.

Bonus Round
If you view the article on PLoS, you will have access to two Excel workbooks to help you make the charts presented in the article.

I'll share some of my own attempts to "show the data" in coming posts. Visit bump charts and cluster charts to learn more.

Sunday, April 19, 2015

Looking at Disproportionality

The concept of disproportionality underlies much of the reform movement for education in the United States. Sometimes referred to as the achievement gap or opportunity gap, the basic idea is that outcomes for all students are not equitable. While much of the conversation focuses on race, disproportionality applies to any subgroup: gender, special education, English language learners, free/reduced lunch, and so on. Over the past several years, much of the conversation about disproportionality has focused on student achievement---and test scores in particular. But just as there is more we need to look at in addition to race, there is more to children than representing them as test scores. We can look at disproportionality as it relates to sports or after school activities, student discipline,  attendance, and other factors.

If you want to examine disproportionality within your system, there are a few pieces of data that you will need to know. In the example below, I'm going to use gender (male, female) as subgroups. (Note: I realize this is a very heteronormative view of gender. Our data systems need to catch up with our increasing understanding as a society about gender identification; however, right now, most school data systems are set up to only capture the binary male/ I'm going to use it as an example.)

First, you need to know the enrollment numbers and percentages for each subgroup. In other words, how many possible students are there who could participate in an athletic program, be subject to suspension/expulsion consequences, fail Algebra, or yes, pass the state test? Many schools report gender as close to 50/50 percent, as one might expect, but variations do exist. Don't assume that you're starting off with equal pools of participants.

Secondly, you need to know the participation numbers and percentages for each subgroup. Just because everyone is eligible to pass the state test doesn't mean that they do. So, how many males/females met the standards? How many in each subgroup were suspended? Enrolled in Physics or Calculus? Turned out for basketball?

In this example, we have a school with 250 males and 275 females, with 50 from each group enrolled in Algebra. Now we need to calculate the disproportionality.

To determine the number of males required to achieve proportionality for the total population, we use the first equation described above (n males for proportionality = (50 * .52) / .48) for a result of 45.5 males. The second equation gives us 55 females needed for proportionality.

Next, we take these two and compare them with the number of students in each subgroup that are participating. For males this would be 50 - 45.5 = 4.5; for females 50 - 55 = -5. That -5? It means that we need five more females enrolled in Algebra to achieve proportionality.

While it may not be entirely realistic to achieve perfect proportionality within a system for all programs, subgroups, and outcomes, it is still important to review these data to reflect on areas where institutionalized racism or policies may be contributing to disproportionality. Another factor to consider is the size of the subgroups that you are reviewing. For example, if you only have two or three American Indian students in a grade level, it's unreasonable to expect that they are represented in everything---but you should look to be sure that they are represented somewhere among school offerings. In that case, it may be more helpful to use longitudinal data to get an idea for participation.

Here's an Excel workbook that allows you to easily compare gender equity in sports programs. I built it a couple of years ago for a program that needed it, based off an idea of Debra Dalgleish. See her site for even more ideas on data entry forms...and feel free to modify mine to suit your needs.

Monday, April 13, 2015

Session Recap: Data Displays as Powerful Feedback

I had the pleasure of presenting at the ASCD annual conference last month. Each year, I stretch myself a little further in making connections between ideas, as well as between technology and content.

My session description: Developing visual literacy is a key skill for student success with Common Core State Standards. Students also need clear feedback about their progress. Using data displays, such as charts and graphs, we can integrate these goals and increase student achievement. In this interactive session, you will learn strategies that increase visual literacy and foster communication. You will also learn to effectively use data collected in classrooms as feedback with students. Both digital and analog tools for organizing and integrating data into lessons will be provided.

Session descriptions are written at least 10 months before the actual presentation happens. This extended timeline can explain why many sessions are not as promised, which is very frustrating for attendees. You pick something from the catalog that looks like it is the most perfect thing ever, only to show up and discover that the presenter has something different in mind. I try to stick as closely as possible to my submitted description, but I admit that I end up taking a little birdwalk here and there. It's hard not to---you learn so much in between submitting a proposal and actually presenting it. For me, a lot of that growth in learning has occurred to changing jobs this year and getting a much better on the ground view of the lack of visual literacy among students and teachers.

My logic model that framed my presentation was

I started the presentation with a brief look at visual communication in general---pictures have been used far longer than text. Then, we talked about how graphics used as feedback have a larger impact on student achievement than nearly any other type of feedback (e.g., marking answers right/wrong). All of this was to build a case for becoming visually literate.

I won't bore you with all the details. I fused together some previous presentation materials and pulled a lot of pieces of this blog in as examples. But if you want a look at things, I have it all stashed on the same wiki as my other resources.

I was slated for 8 a.m. on the last day of the conference---not quite the worst possible time slot, but just about. So, I had a small, but awesome crowd. Lots of comments afterward made me feel good, from one gentleman who said it was the best hour of the entire conference (and asked if there would be a Part II) to another with a very heady offer I'm kicking around.

Proposals for next year are due in a month, so I am already kicking around things to share. I think that I will put in something about using questions to focus data use...and something similar to this year on visual literacy skills. We have to expand our conversation about visual literacy. We work so hard in schools to be literate in other ways. We practice rules for grammar, punctuation, and different forms of writing...all with the goal of improving communication. But for the most part, the visuals developed are junk. And that needs to change.

In the meantime, there is SO much I want to learn. I would love to try and go to the Eyeo Festival next year. Or somehow wrangle an invitation to the Tapestry Conference. I'm feeling a need to get beyond the borders of education and exchange ideas and resources. I continue to do lots of reading and thinking (no matter how quiet I am here) and am always pondering what to learn next. Isn't that what we want for our schools, too?

Sunday, March 22, 2015

ASCD 2015: Data Tools

This is the third year that I've been on the hunt for high-quality data tools in the Exhibit Hall at the ASCD annual conference. The first year (2013) was downright depressing. Last year was better---I ran across a couple of promising tools, although neither are represented at the conference this year. Here are the trends I'm seeing this spring.

Data Capture
This is a brand new theme this year. I saw three different tools yesterday that are meant to support teachers in recording student conversations or other "in the moment" data points and then associate those with a gradebook or spreadsheet. I am intrigued by these. I think their benefit may be somewhat limited right now. Teachers would need to be outfitted with tablet devices and know how to seamlessly integrate those with their classroom work. I suspect that more and more teachers fit that description each year, but my school district is not quite there yet. One thing that I really like about these tools, however, is that they put the power of assessment back into the hands of teachers. In an age where we large-scale district and state assessments carry the weight and propel the discussions, these tools give teachers another way to show student learning. Yes, these demonstrations were always there, but now there are supports for teachers to share the very important daily learning with others. The best tool of the bunch? Just open Sesame.

Item Development
With the advent of new online assessments in many states, such as Smarter Balanced and PARCC, there is a new emergence of tools that allow classroom teachers to build items and assessments that have many of the features of their large-scale brethren. SchoolCity was the first one I saw last year, but there are a couple of new players showing their wares at this year's conference. The best one of these is Edulastic. Most of these tools promise integration with your gradebook or data warehouse. I think we have to be cautious, however, that just because you can make all sorts of new-fangled items for kids to answer doesn't mean you should. If your goal is just to have kids practice responding to particular sorts of items (e.g., drag-and-drop) for "The Test," then I hope you'll think a bit harder before purchasing this kind of software. We also need to support teachers with the basic assessment literacy required to write good items to measure student learning. We haven't done that much with paper/pencil tests---and online forms mean we have even more background knowledge to build.

Design Is Better
Most of the tools I saw yesterday show some thoughtful design. As a whole, they're far better than they were two years ago, but there is still a long way to go. I didn't find a single vendor who uses a data designer for their displays---they all depend on developers to code whatever charts and visualizations they have. Some claim that their charts are "designed by teachers." This is also a bad answer. Teachers and other educators should definitely have input---they are the end users for the tool, but they are not data designers. Listen to the stories classroom experts need to have told, then create the interface to communicate in a powerful way.

One vendor has the biggest, ugliest, exploded 3D pie chart on their screen. I asked them why they had chosen it. The rep wasn't sure. I probed further: Why is it in 3D when you only have two dimensions of data? His reply: Because it looks cool. No, doesn't.

We have to demand better from our vendors.

If you're in Houston in the next day or so, swing by the conference and check out all the new data tools for the classroom. Or, if you've already wandered through the vendor area, feel free to leave your new favourite option in the comments.