Tuesday, November 27, 2012

Come on Over

Did you know there's a Facebook page for this blog? It has an RSS feed, as well as miscellaneous news from the data viz world. Here are a few items recently posted:
 Hope you'll join the conversation on Facebook!

Sunday, November 25, 2012

Ah! Something New Has Been Added

During my hiatus over the summer, a few readers were working hard with Excel and were kind enough to send along questions and comments. So, as I catch up on hosting duties here, let's pull out an item from the mailbag and see if we can share some solutions.

Mark2906 asked, "If I need to add more assignments to a particular standard (say 11 tests instead of 7 or 8), how do I alter the formulae for calculating predicted scores?"

from here; Do you know who this is?
Excellent question. No one wants to be stuck manually updating their spreadsheet every time an assignment is added. It's nearly 2013, dammit, and if we can't have flying cars, we can at least have our spreadsheets make an effort to help out around the place.

The good news is that Excel will tag along for most of the adjustments you make to your spreadsheets. Need to copy, move, and paste some data? It will update the formulas automagically.

But that's not quite what Mark has asked us. In many K - 12 classrooms, you don't know exactly how many assignments/pieces of evidence you will be collecting for a particular standard or unit of study. So, here is the simplest way to address this.

Leave Room to Groove
There's no rule that all of the scores in your gradebook have to butt up against one another. When you create a new section for scores, just label more columns than you think you might need. For example:

We're Off on the Road to Excel...
Two sections for standards are shown, each with space for ten assignments. So far, we've recorded five scores for the first standard, and three scores for the second.

To determine the average for Jerry Colonna on the first standard, we can use the following formula: =SUM(C4:L4)/COUNT(C4:L4). We can then just copy the formula down for the other students. Excel will not treat the empty spaces as zeros. Should you need more columns, just insert them after the last assignment and the formula for the average will keep up with you.

If blank columns are occupying too much real estate, you can always hide them.

Other Ideas to Consider
There are such animals as Dynamic Named Ranges, which expand to fit the data supplied. You can read more about them here or see them in action here. Although they can be very handy, I think they're overkill for this particular problem.

I'm sure there are other solutions, including VBA options. Please share your additional ideas in the comments.

Wednesday, November 21, 2012

Building a Better Heat Map

In the last post, I shared my version of a heat map created in Excel. Now it's time to get into the details of the build.

After you've organized your data in Excel, start a new sheet for the heat map. When I did a "back of the napkin" sketch for this visualization, I knew I would need 37 columns: one for the name of the district, and 36 others to fit three years of data points for four areas (dance, music, theatre, visual) for three grade bands. I labeled my rows and columns.

Next, I created a formula to fill in the data points. Because the values for each cell were a proportion of the grade band, they were all a variation of this one for cell C5, the first one for the District A: =2010!C2/SUM(2010!C2:F2), where 2010! represents the sheet with the data for the year 2010, C2 the cell with the drama number (if any) for elementary, and C2:F2 representing the four cells of elementary data. That's it. Use the fill tool for the rest of the cells.

If you're wondering why I have each year on a separate sheet---there's no good reason, other than it was handed to me that way. I could have easily pasted all the data into one spreadsheet. Do as the Excel spirit moves you.

Let's talk about color for a moment, before we dive into conditional formatting. If you need a two-value system (e.g. to represent gains/losses), remember to stay away from green/red. It's not friendly for those with colour vision issues. Orange and blue is better. For a single value system, blue works well. Why on earth did I pick purple, then? Aesthetic reasons. Blue worked fine, but seemed a bit boring. The purple just seems to suit this data set. If you need help with colour scales, you might start with the Color Brewer.

Also, why do I have ten categories? It seemed like a natural fit for a scale based on percentages---I stuck with the base 10 scheme. My full data set for this piece has over 9000 data points and I wanted to see some nuance. But I could just have easily picked quartiles or another factor. Which one should you choose? Ah, that's part of the art of data science. You really need to play with your visualization a bit to find out what cut values work best.

Okay, let's dive in to the conditional formatting. You're going to need 11 rules. Why 11? Because we also want to create one for any 0 or blank value.

For each rule, tell Excel to do with a range of values---your instructions should include the color for that part of the scale as both the font and fill. This way, you will "hide" the number in the box. Go ahead and put a bunch of 9s at the end of the upper limit. If you have a big data set (and who doesn't like 'em big?), you're going to want to account for all the little variations within it.

Your last formatting job has to do with the borders. In the menu for borders, make each cell have a thick white outline. This will make your coloured boxes look separated. Go into the View tab and uncheck the box next to "gridlines."

The invisible border dialogue
Okay, now you need to change the width of the columns and rows to create squares. Just do the old select and slide until everything looks square.

Finally, use the shape tools to add some very thin grey lines between each chunk of the data set. Yes, you're actually going to draw in those borders.

I made mine a little longer between the grade levels than between the different arts areas.

Guess what? You're done.

One of the things I like about this project is that it's simple, but rich and interesting. The more time you spend, the more you notice the little nuances...things you might never see from just the numbers alone. I think this project also illustrates how much of design is not about the number-crunching. There's just one formula here. The rest is making Excel turn that data into something visual. And yes, that takes some time and finesse (it is Excel, after all...not a tool that was developed for data design), but you can do it.

So get out there and make something that grabs people's attention because it's pretty. And when you've done that, come back and share it here.

Saturday, November 17, 2012

A New Spin on Heat Maps in Excel

There are lots of heat maps out there. We saw many different versions during the election cycle---everything from campaign financing to election results. These visualizations use a scale of color values to communicate a range of data. Some vary the intensity of a single color value to represent a variable (e.g., the greater the population in an area, the darker the color). Others use two values to present a comparison (e.g., increases and decreases to population over a given time period).

Excel does not have a heat map option in its charts; however, you can use conditional formatting and some other design options to create a really fine looking visualization.

One of the graphics included with Stephen Few's workshop presentation was this one:

Market Segment from Stephen Few's Now You See It

I was really taken with it. It's a simple layout, but packs a punch in terms of showing the data. It was the perfect fit for a data set I was wrangling. Take a look at my version (there is an interactive version at the end of this post):

What you're looking at is some data about the different areas of the arts (dance, music, theatre, and visual arts) for three grade bands (elementary, middle, and high school) over three years (2010, 2011, 2012) for districts in Washington state. Please note that although this data is public, it is only available to those who ask. Therefore, I have made some modifications to what I am sharing here: no districts are identified by name and I am only including ~50 districts.

Never underestimate the power of pencil and paper when planning a visualization. Using the version from the workshop as a model, I sketched the layout for this, as well as how to fill the cells. I decided not to use number of students as a factor, as that would skew things based on the size of the district. Instead, I wanted to look at each arts area as a portion of overall enrollment in the arts. In other words, what percent of elementary students in a district had access to instruction in dance vs. music vs. theatre vs. visual arts?

There is a single color scale. The greater (darker) the intensity, the greater the percentage for that area of a grade band. Blank spaces represent a lack of data. This might be due to no instruction offered in that area, but it could also mean that we just don't have any data from the district. Don't read too much into those.

Even though I used a single color (purple) for this visualization, it is still simple to look at gains and losses over time. For example, for District A, we can see that at the middle school, music instruction has decreased over the last three years...and the percentage of students involved with the visual arts has increased.

Similarly, I can see some big trends when viewing the entire data set at once.

Notice how popular Music is at elementary? There are a lot of dark squares in that second column. While not as strong, there is a definite preference at the high school level for visual arts, based on what I see in the last column.

The benefit of a heat map like this one is that it allows you to visualize a lot of data at once (there are over 1500 data points in the image above, including blank spaces) while not losing any of the points.

In the next post, I'll get into the nitty gritty about how I built this. For something that looks rather elegant (if I do say so myself), there is only one formula involved, with some conditional formatting and shapes applied. In the meantime, if you want to play with the spreadsheet a bit, use the Excel Web App below. Sorry about the horizontal scrolling required, but you can also click the bottom right-hand corner to view the sheet full screen.

Thursday, November 15, 2012

Dashboard Design for Education

I've posted several versions of reporting tools over the young life of this blog...but I haven't shared any dashboards. I've been holding onto my ideas for other purposes, but that doesn't mean I haven't spent time perusing what others are doing.

I have yet to find a commercial product that I would recommend. Every implementation of Cognos I've seen so far has been awful enough to make me beg for eyebleach.

3D Stacked Cylinders. Yes, really.

I spoke with an Edmin rep for awhile this summer. Very enthusiastic about their product, but admitted that they'd never consulted with any designers about the interface. It shows. Engrade suffers from the same issues. Here in Washington state, something called Homeroom has rolled out to most districts. When I talked to these reps nearly a year ago, they bragged about how their design was based on what teachers wanted. In fact, all these companies say that---and that's great. Knowing what data your audience needs to see is critical. But most educators are not designers and the incredibly poor output for all of these companies reflects this. I am disheartened by all the rich meaning that is hidden or lost because none of these companies can be bothered to consult with someone about line, color, and other basics.

But what happens when we build a design and don't have audience input? You get a very pretty dashboard...but is it useful?

Stephen Few recently hosted his annual dashboard design competition. (For a full discussion of the entries and selection process, visit his blog.) And this year, the challenge was to develop a student performance dashboard. Here is the winning design, by Jason Lockwood:

from http://www.perceptualedge.com/blog/?p=1374

It's very pretty---and very Few'ish: the colors, the style, the fonts, and so on. Once you know what Stephen likes, all of the dashboards look the same. (I'm not sure that's a good thing, but we'll save that discussion for a different post.) The second place design, by Shamik Sharma, is below:

from http://www.perceptualedge.com/blog/?p=1374

I think the challenge with educational data is simply that it's hard to "snapshot." We need to see every student---not just the Top 5. And there is a lot of stuff to consider---not all of it fits neatly into little quantitative variables. As I mentioned to Mr. Few at the workshop I attended, schools aren't making widgets: we're about people.

Those who entered this contest used a dataset that was provided to them, so I can't blame them for the volume of information contained here. But as I look at these as a teacher, I see information that isn't necessary for a dashboard (e.g., the standardized assessment results) and a lot of very poor grading practices represented (e.g., averages and letter grades). The designs are completely disconnected from the real-world audience. It is a limited audience, at that. These dashboards might be adapted for use in a secondary (or higher ed) core subject classroom, but not for elementary (multiple subjects) or performance-based classes (like PE). In short, the designs have a beautiful form and almost no function.

Can't we all just get along? How do we get those with expertise in the classroom connected with those who have expertise in data design? What would you like to see included with a dashboard?

Monday, November 12, 2012

The Exhausted, The Design-minded, The Few

Last month, I had the privilege of spending three days at a workshop by Stephen Few. I had a five-hour commute each day to the workshop, which made for a very long week, but I'm grateful I made the effort.

It was quite a contrast with the Tufte experience. Instead of a room of 600, I found a room of 60. This led to a lot more interaction between participants and with the presenter, something I appreciated. One of the biggest differences between Tufte and Few is their purpose in talking about design. Tufte is much more academic and esoteric---design as more form than function, in some ways. But Few, like most of us, recognizes that while it's all well and good to talk about what design should be, at some point, we have to get practical about things. This is his purpose in writing and presenting.

Another contrast between the two gurus is how they view the audience. Tufte's reflects his Ivory Tower existence, with a basic premise that you present the data and the audience decides how they want to evaluate and use it. Interactivity is great, I think, but most of the people I work with are not data literate. And judging from the conversation I had with several others at the Few workshop, their co-workers and audience are no better with data. Few's view of the interaction between designer and audience is that the designer should listen to what the audience needs/wants---and give that to them in a way selected by the designer that is the best format for the data. I find this to be a much more practical approach.

Each morning started with a "pop quiz"---a series of questions about the upcoming ideas for the day. I thought this was a great way to get things going, not just from a management perspective (latecomers didn't interrupt the presentation), but as a teaching strategy. I wish he'd set aside time at the end of the day for everyone to review their answers. Not doing so misses a fabulous opportunity for people to reflect on what they'd learned throughout the day.

Most of the day was a traditional lecture format, with a one-time small group assignment included. I would have liked him to break up the lecture with a few pauses for people to have some partner talk about ideas. There was a lot of material to process along the way. I have no beef that all of the examples were business focused---the workshops were targeted for that walk of life, and I am but a humble educator. But not everyone works with profits and losses. The principles of good data design may apply across the board, but not everyone's roles are the same. If we really want to make changes in the way we use data to communicate, there needs to be opportunity to think about how to apply new learning.

Lessons Learned 
He didn't share anything that you couldn't learn from reading his books, which was my only real disappointment. Not to say that his ideas about data design aren't worth a second (or third) tour through the material, but I believe that presenters should use "live" opportunities to extend their thinking. If we're there, we've probably read the material---help us take the next step.

All that being said, this was a worthwhile opportunity to learn---my quibbles are more about the way things were said than what was said. I would recommend the workshop to anyone interested in the basics of data design. Here are a few of my takeaways:
  • I liked that nearly all of his examples were built in Excel. As he pointed out, Excel is not a design tool; however, it is a data tool that nearly everyone has...and if you can make something look good with Excel, then you have no reason not to make your data shine elsewhere.
  • For Few, what makes a dashboard unique (vs. other reporting tools), is it's purpose: monitoring. It should allow the audience to scan the big picture, zoom in on specifics, and link to supporting details. He really pushes the idea that there should be no scrolling---the dashboard should fit a single screen---but in a time where you can't guarantee which device will be used to view the data, I don't know that you can make no-scroll a "must." The best you can do is limit it by being thoughtful about what you present.
  • Details matter. I think this is the one piece that is most misunderstood by most people. The colors you choose, the lines you use, the way your organize content should be just as agonized over as the data quality going in and the questions you draw out. Typically, this piece is tossed aside. If it wasn't, there would be so many crappy visualizations out there.
I'll share a bit more in the next two posts. Come back for a look at Few's most recent design contest (with a classroom focus) and my take on one of the visuals presented.

Bonus Round
Hadn't seen this clip from The Onion before...but it was the very first thing shared at the workshop. Thought you might enjoy it, too.