Friday, February 12, 2016

Living Large with Small Multiples

In the poem Song of Myself, Walt Whitman writes "Do I contradict myself? Very well then, I contradict myself. I am large, I contain multiples." And while Excel might not be considered by some to be as sexy as Whitman's prose, they do have some things in common.

Small multiples use a series of similarly scaled charts. The purpose is to allow for easy comparison across time or groups. When you use these charts, you are looking at the forest and not necessarily the trees. You don't want to focus on details as much as you search for larger patterns to investigate.

I built the example below this week. It's a series of scatter plots. Each tiny blue dot is a student, and their positions on the charts represent the point where their percent attendance and scale score on the state assessment intersect. An orange line shows the linear regression for the data set in the chart. The line tells us a couple of things. It provides a quick visual on the range, as well as the basic trend.

What kinds of things do we notice? Maybe it's how students who score in Level 1, regardless of grade level, don't have much of a discernible pattern. Level 3 students tend to clump---their rates of attendance and scores are very similar to one another. Maybe we have a conversation about those areas where the line slopes downward. How do we explain a trend where the more you come to school, the worse your do on the assessment? Or maybe even the overall picture isn't what we might predict. Even those trend lines that have an upward slope aren't very steep. Wouldn't we think that better attendance leads to better scores? And maybe we need to talk about what's happening when kids get to sixth grade and attendance starts to get a lot more poor.

Because you likely can't read the itty-bitty labels, I will confess that I have broken a cardinal rule when building this: the y-axes are scaled identically for each grade level, but not among all the grade levels. Percent attendance is plotted along the x-axis and is the same for all of the charts. But the range for scores changes. The higher the grade level, the higher the possible score. I've tried to mitigate this by keeping the y for each grade at about 400 points. If I'd had to squeeze in the entire score range for all grade levels, the information became so squeezed into the charts that it was difficult to make sense of things.

There are thousands of students represented on this single graphic. While focusing on an individual is critical to the daily work of the classroom, we're not doing so when we use small multiples. This time, it's about the herd. I've enjoyed looking at this because I see something different every time.

What will school principals see when I show this to them? I'm not sure. I'll have to provide a little support in learning to read it, but I think it is easy enough. The chart will be part of a larger conversation around student piece of a puzzle where they will apply context.

Are you using small multiples in your work? How have they been useful?

Bonus Round
To build this, I organized the necessary data and then used a pivot table and slicers to pull attendance and scale scores by grade and score level. Dynamic ranges were used for the charts, allowing for expansion/contraction of the number of data points.

Each chart was pasted into PowerPoint. This allowed me to size and position all of the charts and labels, as well as easily share the document.

Sunday, January 24, 2016

Ethical Communications Using Data

Much of the data we collect as educators is subject to various federal, state, or local regulations about who can see the data and for what purposes. These ethical considerations most often apply to data points that connect with an individual or very small groups. Once aggregated, we tend to slide into our own version of ethics. We make decisions about how the data are presented and annotated. We choose the stories, the focus points, and even the audiences we share with. What are the questions we should be asking ourselves as we make these choices?

A recent forum on Responsible Data Use generated some categories and avenues of inquiry around this topic. I've read through the summary several times now, and with each glance through the list, I find new things that I'd like to discuss. Here are a few that catch my eye:

Communicating uncertainty
  • How do we communicate uncertainty in data?
  • In metadata?
  • How do we represent gaps in the data?
  • What if our knowledge of the uncertainty in the data is anecdotal?
  • How can visuals show “no answer”?
  • How can data visualization promote ambiguity?
  • How do we improve everyone’s data visualization literacy, as creators and as viewers?
  • How do we educate people about the data they create?
  • Which people most need data literacy?
  • Can we provide interactive tools that let viewers adjust data visualizations in real time as a means of improving literacy?
  • How can we support grassroots groups to create better data visualization?
  • Is there a need for basic design principles and data viz 101 resources?
  • How do we navigate a fear of numbers?
BAD data viz
  • Is meaningless data visualization worth anything?
  • What about when people make decisions based on bad data viz?
  • If raw data is unrepresentative, will visualizations on it be bad?
  • We should collect examples of unethical data visualization.
  • How do we involve the audience?
  • Who is the audience, and why?
  • How do we create community ownership of a data viz?
  • How do we allow a data viz to speak to multiple disparate audiences?

Some of these questions are easier to answer than others---we can think of a few ways to represent a lack of data. Others, like those in the "BAD data viz" group, are not so simple, but would be fun to kick around and see where we get. What would be your priorities in your workplace?

The summary with all of the categories and questions also has links to a variety of resources and notes connected with the forum. They are well worth exploring, if you have a few moments.

Sunday, January 17, 2016

New Frontiers

I have probably shared this before, but this tweet from Hilary Mason is something I think a lot about:

Within my own work, I continue to raise conversations about equity. How will students and teachers be empowered by the data we collect and review? How do we provide opportunity for marginalized voices to be heard? How do we elevate their needs and concerns?

I am starting to collect resources around both equity in data visualization. Here are a few that you might be interested in, too:

In other news, I have been working on a little behind the scenes cleanup on this site. I've refreshed the information on the various pages, as well as the front page side bar. I also recently purged my Delicious site, deleting dead or irrelevant links and reorganizing the remaining ones. If you're searching for Excel info, research, tools, or anything else related to data viz (including equity), please head over to that site. There, you can sort links by a variety of tags.

I'm heading to the Tapestry Conference and Eyeo Festival this year. I am feeling the need to stretch and grow beyond the field of education. I do my best to connect with others in a digital space to learn about how they are using data, but I would like to do some in-person networking, as well. Will any other readers be at either of these events? 

Tuesday, December 29, 2015

Show the Data: Distributions

Last spring, we looked at a couple of ways to show the data: Cluster diagrams and bump(s) charts. The idea here is that when we summarize data and represent it in bar or line charts, we miss nuance. Instead, when possible, we should look for ways to show all of the data.

I had this in mind when I recently tried to tell a story about student performance in my school district. The district does a pretty good job with about 80% of its students, and because that's far better than the state average, no one asks the hard questions about the remaining 20%. Shouldn't they learn to read and graduate from high school, too? I certainly think so...but I often run into roadblocks when I try and raise this conversation. Maybe I need a different visual to use as persuasion.

I remembered some information for another post about recent research in using distributions to replace bar charts and thought I might give it a try.

Here is Exhibit One:

This is typically what we provide to schools (and the public) regarding student performance. This chart represents one grade level at one of our elementary schools. Levels 1 and 2 (L1, L2) represent the percent of students who did not meet the standards ("pass"), while Levels 3 and 4 show the percent of stuents who met (L3) or exceeded (L4) the standards for English Language Arts. Generally speaking, this doesn't look so bad. Lots of kiddos did well enough, and L1 is the smallest category. Yea, Team!

But who are these kids in each category? Do we have the same distribution in performance if we look at student demographics? Let's find out.

Here is Exhibit Two:

A little orientation to this beast. We still have levels of performance (1, 2, 3, 4) on the x-axis, but the y-axis shows the actual scale scores. In other words, for this grade level and subject, Level One is actually made up of scores in the range of 2114 - 2366, Level Two ranges between 2367 and 2431, Level Three is represented by 2431 - 2489, and Level Four includes performance between 2490 and 2623 (source). Every child's score (n = 69) is represented by a circle on the chart.

It might be interesting in itself to just look at the distributions. But I've added some information based on student ethnicity. The grey circles that you see represent students who are white (n = 54). The pink circles represent students of color (n = 15). Overall, only one-third of scores from students of color are in Levels 3 or 4, while about two-thirds of the white student performance are in those levels. And, one-third of all students of color are in the lowest category (Level One).

If you're wondering about why I am not representing different groups (American Indian, Asian, Black, Hispanic, Pacific Islander, Two or More Races, White) with different colors...well, I can tell you that I wrestled with that decision quite a bit. Our district has very small numbers of students of different races. For example, for the school and grade represented above, there are no black students. There is one American Indian student shown on the chart (I can't tell you where, due to FERPA restrictions). This student as an individual is important and worthy of all of our best efforts. When represented by a score, conversations become problematic because there is no way to compare it with others in the same group.  Disaggregation of the data at the grade and school levels does not cause the sorts of inquiry that it should because "it's just one score." Trust me---I've heard that refrain quite a bit. But when I add the "just one score" with others in a building who represent non-white students, there's a bigger argument to be made. Your mileage may vary, based on the populations you are working with. All that being said, I am very open to feedback on this. What are some other options I should consider that will balance tiny n-size against the overall story to be told? Stacked bars, perhaps?

I realize that that two charts I've shown in the post represent different things. One is just the overall percentage by category...while the other is distribution by category. So, one isn't necessarily a replacement for the other. Even if I altered things a bit by showing numbers of students in the first one, it would result in the same chart. But I think there is some real power in looking at the second chart---even if it was not coded---and understanding that every child is there. It's not a blob of summary performance...and goes beyond a simple count of who is in each box.

So, here's looking at you, kid. (Especially if you aren't white.)

Bonus Round
The distribution of performance chart shown above was built in Excel (of course). It is a basic scatter plot chart, with specific scores selected and colored either grey or pink. If you visit the research site I mentioned earlier (Beyond Bar and Line Charts), they have some workbooks you can download and easily modify.

Saturday, December 12, 2015

WERA 2015: Data Viz Workshop

I've done several presentations over the years about data visualization within public education. I've talked about graphic representations as a form of feedback, types of tools, guidelines for improving communication using visuals, and more. All have been brief 60 - 75 minute affairs with some very simple sorts of activities and conversations along the way.

This week, however, I had an opportunity to guide my first workshop. I had three and a half hours available to support educators in really digging into telling the best stories we can with our data. Having this sort of space and time available enabled me to think about the content differently. I've posted links to previous presentation materials here, and this post is no exception. For those of you who might be interested in scoping out the slides, materials, or links, head on over to ye olde dataviz wiki to take a look through the stash.

What I wanted most for the audience this time through was an opportunity for self-reflection and metacognition. Educators have relentless jobs. There is often no chance to think about what did or didn't work with a group of students today because they will be here again in the morning...and the tyranny of the urgent is to plan for that. I felt like it was important for our afternoon to be a time where people could become more aware of their own design process---no matter how simple or sophisticated it might be---and, more importantly, be inspired. I tried to bring in many different examples of current projects from a variety of fields. The best way to get out of your bar or line chart rut is to see where others have made departures.

I don't consider myself an expert in any of this. I do consider myself curious about it. I do see a significant need in our field to elevate our visual communications, as well as prepare our students to do the same. I want to continue these conversations and add what I can. I will refine my workshop materials and perhaps have another opportunity to engage in this work at another time. I enjoyed it and hope that it's the start of something bigger and better for our field.

Monday, November 9, 2015

The Agony and the Ecstasy

Am I the only one who agonizes over the best way to represent a data set? Is there a 12-step program for those of us who are occasionally paralyzed by all of the visualization options? If this sounds familiar to you, read on for my latest struggle in bringing a data story to life.

Several weeks ago, I was asked by my superintendent for information on the achievement gap. For those of you who might not know this particular piece of eduspeak, it refers to the difference in achievement levels between populations of students. For example, white students often perform better on standardized tests than black students. This difference is referred to as the achievement gap.

This request should have been a piece of cake. I have the data (scores and student demographics). I'd done a similar project last year. But I felt like the bar charts I'd used were lacking. They show differences in performance plainly enough, and yet it's difficult to capture information for various populations in a single, easy to read report.

I started as I usually do, hand drawing some possible layouts and then building a few models using the data.

These examples all show longitudinal information for males and females at a particular grade level and in a certain subject area. The particulars are not too important here. What I discovered in doing these, even before cleaning up the charts, is that none of them were satisfactory. They all showed the data accurately, but none of them captured "the gap" in a way that caused any sort of interest or reaction.

Back to the drawing board.

I realized that I needed Excel to show me the percentages on one axis, like a number line, so the space...the gap...became visible. Here is what I ended up with:

This is a bit out of context, so let me tell you a bit more about what you're looking at. This chart only shows 2015 data for one grade level. The horizontal line is the number'ish line: 0 - 100%. The vertical line shows the overall percentage of students who met the standard on a particular assessment. The placement of populations (shown by orange or blue triangles) provides a general relationship to that overall level of achievement, as well as shows the gap between the populations. I do include the actual n sizes and percentages in a table below the charts.

Here's a broader view:

I am not going to show you the data tables, due to FERPA issues---some of the subgroups shown above have fewer than 10 students. I need to stay within the bounds of federal privacy laws in this public space, but just know that they exist to provide detail to data users in our district.

I'm really happy with this layout, however. It gives, at a glance, an easily readable view of the achievement gap at a grade level. When looking at these over several grades, patterns begin to emerge. This is especially important for those groups where the n size is very small for a grade level. For example, having only one black student in a grade might not tell you much if they didn't meet the standard, but when you see that our small handful of black students at every grade level all fall well below their peers, it's alarming. It's also easy to cover up either the orange or the blue markers and get a quick picture of who is or is not successful.

While I still have the longitudinal view to consider, it's simple enough to build similar charts for a few years of data and then align them to provide a similar glance at trends.

I apologized to my superintendent for my tardiness in delivering the product, but I think the agonizing has given way to some ectasy over seeing things in a way that's clear and best represents the question to be answered.

I don't know that anyone, other than those of us struggling to represent data, understands why it takes so much time to build a report. Others don't see how many different charts we modeled...all of the colors we tried...or the variety of label placements (and label content) we viewed. They don't hear the conversations with have with people around the office to learn more about what is or isn't working for them in our draft visuals or how they want to interact with and use the information presented. But for those of you who are knee-deep in this process, I'm cheering you on from here.

Bonus Round
Like these charts? They're just scatter plots in Excel, with the vertical axis removed. Easy-peasy to make if you're on the hunt for something similar using your own data.

Sunday, August 16, 2015

Anatomy of a Design Build

Like many school districts, we have new data this fall. The state has changed to Smarter Balanced assessments to measure student knowledge and skills with Common Core State Standards. One of the challenges associated with presenting these data is the inevitable effort to compare them with scores from previous years. But they do not represent the same things. And so began the challenge to develop this report:

Subject areas are organized across the top. Grade levels are on the horizontal. The blue squares represent grades and subjects assessed in 2015. The large numbers in the center show the percent of students meeting the standard, with the distribution of scores shown by the column charts at the bottom of the square. Historical data are shown by the line graphs in the grey squares.

It is built in Excel (of course). On another sheet is a table listing the various schools, grade levels assessed, subjects, school years, and scores. A pivot table with this information feeds the report you see above.

I'm mostly happy with this layout and format. I would still like to tweak some of the colours, but overall, I think I've solved most of the issues with representing the data. But it took awhile to get this far.

In the beginning, there was a suggestion from one of my supervisors to offer something like what you see on the right. This is what another district in the area provides to schools and the public. I showed it to one of our principals and he said it made his eyes bleed. I agreed with that sentiment. We can do better, I said.

It's not that the information provided here is bad, it's that a pdf of a spreadsheet does not take advantage of what data visualization can offer to make meaning. It has much of the same information I developed for my version. We just used two different approaches.

Originally, I started with something similar. I pulled  demographic and program data and added some sparklines and arrows to show change.

But I decided that this was a stupid idea before I got too far down the road with it. It was way too hard to scale the graphs along the same axes. This is an important thing to do to enable comparisons. But there is just too large of a range to represent in a small space. Not to mention this layout is boring. Seriously. Yes, I know it's half finished, but no bells and whistles involving fonts and headers are going to make this version anything other than a snoozefest.

So, I tried something else.

This is where I started to play with the idea of tiles. I wanted a cool color (as opposed to a warm one) and decided to look at purple as an option. This version is slightly better, but not by a lot. Looking at it, I realized something very important: Almost none of this information is useful. Does anyone, especially a school leader, really care about the year-by-year percentages of Asian students (for example)? It's not actionable data. You're not going to go out and recruit more Asian children if you notice your percentage slipping. There's no action plan a school would make to decrease the enrollment of girls. This is not to say that viewing overall change wouldn't yield some insights, but the year-by-year review isn't very helpful. So I started over. Again.

Third time's the charm, right?

I won't claim this is perfect, but we've come a long way, baby. All that demographic and program data? Now in one tidy bar chart on the left (under "% of overall enrollment"). The five-year change is shown on the far right. The stuff in the middle? All new. Kids are more than the sum of their test scores. So, I've included some additional demographic information about absences and discipline. Now we have some conversation starters.

I did the achievement data next. This is the page at the beginning of this post. I played around a bit with colors and format, but the tiles have been a constant.

Feedback has been mostly positive, but I'm still tweaking things. What would you want to see? How should we show it? Anything we should remove or represent differently?