Monday, November 9, 2015

The Agony and the Ecstasy

Am I the only one who agonizes over the best way to represent a data set? Is there a 12-step program for those of us who are occasionally paralyzed by all of the visualization options? If this sounds familiar to you, read on for my latest struggle in bringing a data story to life.

Several weeks ago, I was asked by my superintendent for information on the achievement gap. For those of you who might not know this particular piece of eduspeak, it refers to the difference in achievement levels between populations of students. For example, white students often perform better on standardized tests than black students. This difference is referred to as the achievement gap.

This request should have been a piece of cake. I have the data (scores and student demographics). I'd done a similar project last year. But I felt like the bar charts I'd used were lacking. They show differences in performance plainly enough, and yet it's difficult to capture information for various populations in a single, easy to read report.

I started as I usually do, hand drawing some possible layouts and then building a few models using the data.

These examples all show longitudinal information for males and females at a particular grade level and in a certain subject area. The particulars are not too important here. What I discovered in doing these, even before cleaning up the charts, is that none of them were satisfactory. They all showed the data accurately, but none of them captured "the gap" in a way that caused any sort of interest or reaction.

Back to the drawing board.

I realized that I needed Excel to show me the percentages on one axis, like a number line, so the space...the gap...became visible. Here is what I ended up with:

This is a bit out of context, so let me tell you a bit more about what you're looking at. This chart only shows 2015 data for one grade level. The horizontal line is the number'ish line: 0 - 100%. The vertical line shows the overall percentage of students who met the standard on a particular assessment. The placement of populations (shown by orange or blue triangles) provides a general relationship to that overall level of achievement, as well as shows the gap between the populations. I do include the actual n sizes and percentages in a table below the charts.

Here's a broader view:

I am not going to show you the data tables, due to FERPA issues---some of the subgroups shown above have fewer than 10 students. I need to stay within the bounds of federal privacy laws in this public space, but just know that they exist to provide detail to data users in our district.

I'm really happy with this layout, however. It gives, at a glance, an easily readable view of the achievement gap at a grade level. When looking at these over several grades, patterns begin to emerge. This is especially important for those groups where the n size is very small for a grade level. For example, having only one black student in a grade might not tell you much if they didn't meet the standard, but when you see that our small handful of black students at every grade level all fall well below their peers, it's alarming. It's also easy to cover up either the orange or the blue markers and get a quick picture of who is or is not successful.

While I still have the longitudinal view to consider, it's simple enough to build similar charts for a few years of data and then align them to provide a similar glance at trends.

I apologized to my superintendent for my tardiness in delivering the product, but I think the agonizing has given way to some ectasy over seeing things in a way that's clear and best represents the question to be answered.

I don't know that anyone, other than those of us struggling to represent data, understands why it takes so much time to build a report. Others don't see how many different charts we modeled...all of the colors we tried...or the variety of label placements (and label content) we viewed. They don't hear the conversations with have with people around the office to learn more about what is or isn't working for them in our draft visuals or how they want to interact with and use the information presented. But for those of you who are knee-deep in this process, I'm cheering you on from here.

Bonus Round
Like these charts? They're just scatter plots in Excel, with the vertical axis removed. Easy-peasy to make if you're on the hunt for something similar using your own data.

Sunday, August 16, 2015

Anatomy of a Design Build

Like many school districts, we have new data this fall. The state has changed to Smarter Balanced assessments to measure student knowledge and skills with Common Core State Standards. One of the challenges associated with presenting these data is the inevitable effort to compare them with scores from previous years. But they do not represent the same things. And so began the challenge to develop this report:

Subject areas are organized across the top. Grade levels are on the horizontal. The blue squares represent grades and subjects assessed in 2015. The large numbers in the center show the percent of students meeting the standard, with the distribution of scores shown by the column charts at the bottom of the square. Historical data are shown by the line graphs in the grey squares.

It is built in Excel (of course). On another sheet is a table listing the various schools, grade levels assessed, subjects, school years, and scores. A pivot table with this information feeds the report you see above.

I'm mostly happy with this layout and format. I would still like to tweak some of the colours, but overall, I think I've solved most of the issues with representing the data. But it took awhile to get this far.

In the beginning, there was a suggestion from one of my supervisors to offer something like what you see on the right. This is what another district in the area provides to schools and the public. I showed it to one of our principals and he said it made his eyes bleed. I agreed with that sentiment. We can do better, I said.

It's not that the information provided here is bad, it's that a pdf of a spreadsheet does not take advantage of what data visualization can offer to make meaning. It has much of the same information I developed for my version. We just used two different approaches.

Originally, I started with something similar. I pulled  demographic and program data and added some sparklines and arrows to show change.

But I decided that this was a stupid idea before I got too far down the road with it. It was way too hard to scale the graphs along the same axes. This is an important thing to do to enable comparisons. But there is just too large of a range to represent in a small space. Not to mention this layout is boring. Seriously. Yes, I know it's half finished, but no bells and whistles involving fonts and headers is going to make this version anything other than a snoozefest.

So, I tried something else.

This is where I started to play with the idea of tiles. I wanted a cool color (as opposed to a warm one) and decided to look at purple as an option. This version is slightly better, but not by a lot. Looking at it, I realized something very important: Almost none of this information is useful. Does anyone, especially a school leader, really care about the year-by-year percentages of Asian students (for example)? It's not actionable data. You're not going to go out and recruit more Asian children if you notice your percentage slipping. There's no action plan a school would make to decrease the enrollment of girls. This is not to say that viewing overall change wouldn't yield some insights, but the year-by-year review isn't very helpful. So I started over. Again.

Third time's the charm, right?

I won't claim this is perfect, but we've come a long way, baby. All that demographic and program data? Now in one tidy bar chart on the left (under "% of overall enrollment"). The five-year change is shown on the far right. The stuff in the middle? All new. Kids are more than the sum of their test scores. So, I've included some additional demographic information about absences and discipline. Now we have some conversation starters.

I did the achievement data next. This is the page at the beginning of this post. I played around a bit with colors and format, but the tiles have been a constant.

Feedback has been mostly positive, but I'm still tweaking things. What would you want to see? How should we show it? Anything we should remove or represent differently?

Thursday, May 7, 2015

Show the Data: Cluster Charts

In the last post, we explored the idea of adding bump(s) charts to our rotation of how we communicate our data. It's one way to show all of the data in a particular set. Another one I've been using quite a bit is a cluster chart. Full disclosure here, these are my own take on displaying data---a bastardized heat map, and certainly not based on heavy-duty math like real hierarchical cluster charts. So, really, I'm not sure what to call these...but in my current role, we're finding them to be very useful and I'm just rolling with cluster charts as my category.

This spreadsheet will eat your soul.
I get a lot of spreadsheets sent to me that look like the one on the right. I hate these with a fiery passion for a variety of reasons:
  • Too much "ink" in the data-to-ink ratio. With all of those little boxes, I don't know where to look.
  • And the colors. I feel like a circus came to town. But beyond that, the red and green are not particularly friendly to those with color vision issues...and I do work with some who are color blind. Are we really asking them to try and make decisions on student learning  based on this?
  • Not to mention that all of the data is colored in. What's the point?
  • And we have both numbers and colors. I'm not saying that you can't have both...or that they don't serve different purposes...but it's distracting. I'm constantly trying to make sense of the number patterns for each color.
I also think these data aren't useful because of the way they are organized. Alphabetical order is great for gradebooks, but not so much for trying to make sense of the data. Plus, we don't have any context---what if we're missing some signal in the noise? Suppose all of our low-performing students are boys...or in a minority group?

But let's say you are interested in showing both the progress students have made over time, as well as the characteristics of the students involved. We can reorganize the data by ranking the percentages on the second assessment (this is the "cluster" part). Then we can color code some additional information, such as gender or participation in a particular program. I also change the properties of the conditional formatting so that the fill and text are the same color, making the values seem invisible. Finally, I add thick white borders around all of the cells and resize the rows and columns. Here is a small part of the final product:

These are all the students who scored in the top level of our fictional grade 5 winter math assessment. Three of them improved a little, from light blue to a darker blue...others improved from lower on the scale (orange). But when we look at gender and program, another story emerges. Most of the students in the top category are female, not on free or reduced lunch, not in special education, and do not receive additional interventions through a Title I program.

See the difference when we look at students who have scored at the bottom for both fall and winter? Our population is mostly male and nearly everyone participates in one or more federal programs.

Maybe this representation doesn't necessarily hold any surprises, especially as we factor in free/reduced lunch. Children living in poverty typically do not perform as well as their peers. But one of the things I take away from this way to visualize that story is that we may need different interventions to support these students. Consider Student 39 on the right. He is receiving free or reduced lunch, special education and Title I services...and he's still ranked fourth from the bottom out of nearly 70 students. It doesn't mean that the school (and the student) aren't working as hard as they can. I do think it might mean that there are additional factors at work here that aren't (and can't be) addressed through the school. Perhaps the family is homeless or transient. Maybe the parents are going through a divorce...or the student has some medical issues. These are community-based issues and require different interventions to help close the gap for the student. I won't get up on my left coast soapbox about this right now. I'll just say that we have to work together on behalf of the whole child.

One of the pieces of information that is not represented in the visuals above is the number one item on teacher wishlists when it comes to reporting scores: progress. Sure, we have a bunch of students performing at the lowest level in the picture above, but that doesn't mean that they didn't make some growth.

This time around, I left the gender and program pieces coded the same, but I calculated the percent change between fall and spring and represented those in the leftmost column.

Look at Student 39 now. He's 12th from the top. Woo-hoo!

When we consider progress, we start to get a more equitable pattern---everyone is growing, and more often than not, it's our lowest performers who are making the biggest gains, even if they're still in the lowest part of the score breakdown.

By clustering similarly performing students together, either by scores or by progress, we get a much more useful pattern than we do with a spreadsheet that looks like a clown exploded on it. And, more importantly, we can show the data. In a very compact space, I can display everyone's scores and whatever demographic or program information is most relevant. And, I can fit the whole grade level on a single page.

I have no doubt that as we move forward, smarter people than me (I?) will continue to find new charts that help share everything we know about a group. Summary stats and charts will never go away---and they have their own purpose to serve. But sometimes we want the full version, not the Cliff's Notes. When we do, bump and cluster charts will be there.

Bonus Round
Want to see just how challenging colored squares can be? Play this online game. The rules are easy: just click on the one square that is different.

Monday, May 4, 2015

Show the Data: Bump Charts

This is a bump(s) chart, a/k/a slope graph. If you poke around online, you'll find a variety of examples and names. Some have multiple data points for each line...and some are simpler, like the one on the right.

Typically, the lines are labeled on each end, often with then name of the data series and sometimes with the data value. I do have a version of this one with the lines labeled, but since these represent real data points, it's best to keep things anonymous for this example.

But let me give you a little context here for what I'm showing. Each line represents a teacher---the entire chart shows the entire staff for a school. On the left is each teacher's percent of Ds and Fs assigned to students for the first semester of the 2013 - 14 school year. On the right is the value for the 2014 - 15 school year.

We want to use this chart to look at two things. First of all, what is the general trend within the school? In this case, most of the lines are sloping downward. This may connect to initiatives, such as changes in grading practices, tutorial options, or improvements in instruction. Whatever the story is behind this chart, it's looking positive.

Next, we want to consider the steepness of the slopes we observe. Sure, we could add a trendline, but if you're just using the chart for exploratory purposes, we can eyeball things. In this case, we might note that most of the downward sloping lines, especially for the upper percentages on the 2014 side of the house, have had significant decreases.

Typically, when I present these charts, I include a summary of the data. For example, between the 2014 and 2015 school years, 30 teachers assigned fewer Ds and Fs to students, 7 teachers had very little change in the percentage of Ds and Fs assigned to students, and 3 teachers showed an increase in the percentage of Ds and Fs assigned to students. Because these charts are new to many of the people in my audience, this brief summary is enough to get them oriented to the chart. They can then begin to focus on the details. This might start with the slope of the lines, but then I see them begin to dig into the labels: Are some teachers in new-to-them assignments this year? Are the lines showing little change all in one content area, such as math? What might we see next year---is there a goal around our percentages?

And now, a musical interlude...

Let's take a look at another of these beasts. This is a different school, but in the same district.

We see a lot of increases compared to the other school; but, if you look at the scale on the lefthand side, you'll see that none have a higher percentage than the other school. Generally speaking, teachers in this school assign a lower percentage of Ds and Fs vs. the other school.

The overall changes at this location aren't as dramatic, either. The slopes are more gentle.

What might account for the differences? Again, you'd have to poke further using knowledge specific to the school: Is this a more veteran staff that is has more expertise or are more resistant to change? Are the increases due to an unexpected change in student population---were the enrollment boundaries changed?

You could, with additional information, make some other comparisons between the two schools. What if you built graphs just showing one department, such as math? It would make the charts less busy and comparisons between buildings a lot easier.

The big idea with these charts, of course, is to show the data. Sure, we could just write the summary and do a simple bar chart or line chart to compare totals...but we're missing a lot of the story in doing so. When we go bumpin', we get a much richer picture of what is happening.

Next time, we'll take a look at another way to show the data using cluster charts.

Bonus Round
These charts are super-simple to make in Excel. Jon Peltier has an excellent tutorial on his web site. You can also download a template from the article I profiled in the last post.

Music credit: "Bumpin Bumpin" by Kreayshawn (c) 2011.

Wednesday, April 29, 2015

Hide and Seek

I want to circle back to an article I wrote a few years ago about my favourite data visualization.

Hierarchical Cluster Analysis by Alex J. Bowers from
It shows all of the grades earned by students during their K - 12 journeys in two school districts. I love this chart because it finds a way to show all of the data in a dense, but succinct, format.

In The Visual Display of Quantitative Information, Edward Tufte states that Above all else, show the data. While the quote was applied to a different concept for visualizing data, when I look at the chart above, the quote rises to the surface of my thinking. Showing the data is no small task, and as educators, we spend a lot of time and energy not doing that. We summarize the data into neat little one letter grades or one number test scores. As teachers, we might see a set of scores...but we are the only ones to do so and we typically view them as numbers, not visual displays.Things hide in numbers and number sets.

But a recent paper shared in the Public Library of Science (PLoS) makes the case that things can be hidden in simple visuals, too.

CC-BY Weissgerber, Milic, Winham, Garovic

The authors of the article Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm assert that the ever-popular bar chart is a summary, and therefore "full data may suggest different conclusions from the summary statistics."  (It reminds me of Anscombe's quartet.) We often claim that pie charts are used to hide data. Et tu, bar charts?

I won't claim that the scatterplots and bump charts in the article are ground-breaking, but this paragraph in particular caught my interest (emphasis mine):

The infrequent use of univariate scatterplots, boxplots, and histograms is a missed opportunity. The ability to independently evaluate the work of other scientists is a pillar of the scientific method. These figures facilitate this process by immediately conveying key information needed to understand the authors’ statistical analyses and interpretation of the data. This promotes critical thinking and discussion, enhances the readers’ understanding of the data, and makes the reader an active partner in the scientific process. In contrast, bar and line graphs are “visual tables” that transform the reader from an active participant into a passive consumer of statistical information. Without the opportunity for independent appraisal, the reader must rely on the authors’ statistical analyses and interpretation of the data.

As educators, we might not view our work as a scientific process, but we must engage with our data. I feel pulled between the notion above that we may be oversimplifying our data presentations and some of the research about how an audience likes their data presented---which is typically charts that are the most familiar. This is not the Great Divide, mind you. We can bring these two things together with some education in the area of data literacy.

Or perhaps we underestimate our audience. I've introduced cluster maps, bump charts, and box-and-whisker diagrams to various groups this year. The first two required very little explanation. Box-and-whiskers did require a bit more orientation, but I never felt like the group using them struggled with the interpretation. I do think that concept of engagement between the visualization and the reader, as posed by the article is important. It's a different way to view interaction---a key piece of a good quality visual. It's not that the visual need be physically interactive...people don't have to be able to click, sort, or filter every chart---but we need to at least cause some thinking about what is presented.

After reading the PLoS article, I'm more convinced than ever that we need to when and why we share all the data. Bar and line charts may well be the fast food version of data viz, but we can begin to add to our visual diet by finding ways to show all of the ingredients.

Bonus Round
If you view the article on PLoS, you will have access to two Excel workbooks to help you make the charts presented in the article.

I'll share some of my own attempts to "show the data" in coming posts. Visit bump charts and cluster charts to learn more.

Sunday, April 19, 2015

Looking at Disproportionality

The concept of disproportionality underlies much of the reform movement for education in the United States. Sometimes referred to as the achievement gap or opportunity gap, the basic idea is that outcomes for all students are not equitable. While much of the conversation focuses on race, disproportionality applies to any subgroup: gender, special education, English language learners, free/reduced lunch, and so on. Over the past several years, much of the conversation about disproportionality has focused on student achievement---and test scores in particular. But just as there is more we need to look at in addition to race, there is more to children than representing them as test scores. We can look at disproportionality as it relates to sports or after school activities, student discipline,  attendance, and other factors.

If you want to examine disproportionality within your system, there are a few pieces of data that you will need to know. In the example below, I'm going to use gender (male, female) as subgroups. (Note: I realize this is a very heteronormative view of gender. Our data systems need to catch up with our increasing understanding as a society about gender identification; however, right now, most school data systems are set up to only capture the binary male/ I'm going to use it as an example.)

First, you need to know the enrollment numbers and percentages for each subgroup. In other words, how many possible students are there who could participate in an athletic program, be subject to suspension/expulsion consequences, fail Algebra, or yes, pass the state test? Many schools report gender as close to 50/50 percent, as one might expect, but variations do exist. Don't assume that you're starting off with equal pools of participants.

Secondly, you need to know the participation numbers and percentages for each subgroup. Just because everyone is eligible to pass the state test doesn't mean that they do. So, how many males/females met the standards? How many in each subgroup were suspended? Enrolled in Physics or Calculus? Turned out for basketball?

In this example, we have a school with 250 males and 275 females, with 50 from each group enrolled in Algebra. Now we need to calculate the disproportionality.

To determine the number of males required to achieve proportionality for the total population, we use the first equation described above (n males for proportionality = (50 * .52) / .48) for a result of 45.5 males. The second equation gives us 55 females needed for proportionality.

Next, we take these two and compare them with the number of students in each subgroup that are participating. For males this would be 50 - 45.5 = 4.5; for females 50 - 55 = -5. That -5? It means that we need five more females enrolled in Algebra to achieve proportionality.

While it may not be entirely realistic to achieve perfect proportionality within a system for all programs, subgroups, and outcomes, it is still important to review these data to reflect on areas where institutionalized racism or policies may be contributing to disproportionality. Another factor to consider is the size of the subgroups that you are reviewing. For example, if you only have two or three American Indian students in a grade level, it's unreasonable to expect that they are represented in everything---but you should look to be sure that they are represented somewhere among school offerings. In that case, it may be more helpful to use longitudinal data to get an idea for participation.

Here's an Excel workbook that allows you to easily compare gender equity in sports programs. I built it a couple of years ago for a program that needed it, based off an idea of Debra Dalgleish. See her site for even more ideas on data entry forms...and feel free to modify mine to suit your needs.

Monday, April 13, 2015

Session Recap: Data Displays as Powerful Feedback

I had the pleasure of presenting at the ASCD annual conference last month. Each year, I stretch myself a little further in making connections between ideas, as well as between technology and content.

My session description: Developing visual literacy is a key skill for student success with Common Core State Standards. Students also need clear feedback about their progress. Using data displays, such as charts and graphs, we can integrate these goals and increase student achievement. In this interactive session, you will learn strategies that increase visual literacy and foster communication. You will also learn to effectively use data collected in classrooms as feedback with students. Both digital and analog tools for organizing and integrating data into lessons will be provided.

Session descriptions are written at least 10 months before the actual presentation happens. This extended timeline can explain why many sessions are not as promised, which is very frustrating for attendees. You pick something from the catalog that looks like it is the most perfect thing ever, only to show up and discover that the presenter has something different in mind. I try to stick as closely as possible to my submitted description, but I admit that I end up taking a little birdwalk here and there. It's hard not to---you learn so much in between submitting a proposal and actually presenting it. For me, a lot of that growth in learning has occurred to changing jobs this year and getting a much better on the ground view of the lack of visual literacy among students and teachers.

My logic model that framed my presentation was

I started the presentation with a brief look at visual communication in general---pictures have been used far longer than text. Then, we talked about how graphics used as feedback have a larger impact on student achievement than nearly any other type of feedback (e.g., marking answers right/wrong). All of this was to build a case for becoming visually literate.

I won't bore you with all the details. I fused together some previous presentation materials and pulled a lot of pieces of this blog in as examples. But if you want a look at things, I have it all stashed on the same wiki as my other resources.

I was slated for 8 a.m. on the last day of the conference---not quite the worst possible time slot, but just about. So, I had a small, but awesome crowd. Lots of comments afterward made me feel good, from one gentleman who said it was the best hour of the entire conference (and asked if there would be a Part II) to another with a very heady offer I'm kicking around.

Proposals for next year are due in a month, so I am already kicking around things to share. I think that I will put in something about using questions to focus data use...and something similar to this year on visual literacy skills. We have to expand our conversation about visual literacy. We work so hard in schools to be literate in other ways. We practice rules for grammar, punctuation, and different forms of writing...all with the goal of improving communication. But for the most part, the visuals developed are junk. And that needs to change.

In the meantime, there is SO much I want to learn. I would love to try and go to the Eyeo Festival next year. Or somehow wrangle an invitation to the Tapestry Conference. I'm feeling a need to get beyond the borders of education and exchange ideas and resources. I continue to do lots of reading and thinking (no matter how quiet I am here) and am always pondering what to learn next. Isn't that what we want for our schools, too?