Sunday, March 22, 2015

ASCD 2015: Data Tools

This is the third year that I've been on the hunt for high-quality data tools in the Exhibit Hall at the ASCD annual conference. The first year (2013) was downright depressing. Last year was better---I ran across a couple of promising tools, although neither are represented at the conference this year. Here are the trends I'm seeing this spring.

Data Capture
This is a brand new theme this year. I saw three different tools yesterday that are meant to support teachers in recording student conversations or other "in the moment" data points and then associate those with a gradebook or spreadsheet. I am intrigued by these. I think their benefit may be somewhat limited right now. Teachers would need to be outfitted with tablet devices and know how to seamlessly integrate those with their classroom work. I suspect that more and more teachers fit that description each year, but my school district is not quite there yet. One thing that I really like about these tools, however, is that they put the power of assessment back into the hands of teachers. In an age where we large-scale district and state assessments carry the weight and propel the discussions, these tools give teachers another way to show student learning. Yes, these demonstrations were always there, but now there are supports for teachers to share the very important daily learning with others. The best tool of the bunch? Just open Sesame.

Item Development
With the advent of new online assessments in many states, such as Smarter Balanced and PARCC, there is a new emergence of tools that allow classroom teachers to build items and assessments that have many of the features of their large-scale brethren. SchoolCity was the first one I saw last year, but there are a couple of new players showing their wares at this year's conference. The best one of these is Edulastic. Most of these tools promise integration with your gradebook or data warehouse. I think we have to be cautious, however, that just because you can make all sorts of new-fangled items for kids to answer doesn't mean you should. If your goal is just to have kids practice responding to particular sorts of items (e.g., drag-and-drop) for "The Test," then I hope you'll think a bit harder before purchasing this kind of software. We also need to support teachers with the basic assessment literacy required to write good items to measure student learning. We haven't done that much with paper/pencil tests---and online forms mean we have even more background knowledge to build.

Design Is Better
Most of the tools I saw yesterday show some thoughtful design. As a whole, they're far better than they were two years ago, but there is still a long way to go. I didn't find a single vendor who uses a data designer for their displays---they all depend on developers to code whatever charts and visualizations they have. Some claim that their charts are "designed by teachers." This is also a bad answer. Teachers and other educators should definitely have input---they are the end users for the tool, but they are not data designers. Listen to the stories classroom experts need to have told, then create the interface to communicate in a powerful way.

One vendor has the biggest, ugliest, exploded 3D pie chart on their screen. I asked them why they had chosen it. The rep wasn't sure. I probed further: Why is it in 3D when you only have two dimensions of data? His reply: Because it looks cool. No, doesn't.

We have to demand better from our vendors.

If you're in Houston in the next day or so, swing by the conference and check out all the new data tools for the classroom. Or, if you've already wandered through the vendor area, feel free to leave your new favourite option in the comments.

Saturday, February 28, 2015

Neither Smart nor Balanced

I'm on a crusade against bad data viz in education. For an industry so consumed with being "data-driven," we don't display it very respectfully. Vendors out there don't help. Instead of hiring data designers to develop visualizations that support educators in getting the most information from their data, we instead get things like this:


Can we talk about this chart from Smarter Balanced?

Here's a brief list of things that make this chart difficult to use:
  • The title is centered. In the western world, we start reading at the top left. When you change that for a chart (or slide or document) then the audience has to use time and cognitive processing to figure out how things are organized...leaving less "head space" to interpret the information.
  • The y-axis label needs to be turned 90 degrees so we can read it easily.
  • There is way too much ink here in relationship to the data. We don't need tickmarks on the outside of the chart. The "%" signs in the stacked bars are redundant---we've already said they're percentages in the title. Axis titles don't need to be bold. We also don't need the y-axis scaled by 10.
  • The colors are awful. Not only do we have the ubiquitous red-yellow-green, which is not very accessible for all vision types, the shades selected here are particularly gaudy. Are we clowns? Do we amuse you?
  • Finally, humans do best at judging length and position (out of all the pre-attentive attributes). This chart takes advantage of neither.

So, let's fix those things.

 Okay, we're getting there...but we still have a problem. It's very difficult to make any comparisons among the groups represented. This is an issue common to stacked bars---in either vertical or horizontal forms.

What if we line up the bars at the division between not meeting and meeting standard?

When we "float" the bars on the chart, we not only have all the information we need, we can more easily dig a little deeper for comparisons between grade levels. (If you like this sort of chart, there's a tutorial on Ann K. Emery's blog.) I could probably improve this one a bit by playing with the x-axis, this one is still a vast improvement over the original...if I do say so myself.

Here is a before and after comparison---same data, different presentations:

So, Smarter Balanced, what do you say? Can you start making an effort for us and get some meaningful data visualizations to go with the new system?

You can also download my workbook for this chart.

Friday, March 28, 2014

When Good Data Go Bad

Not that long ago, we talked a bit about data quality---attributes of your data that describe its "truth." A recent post over at Education by the Numbers caught my eye, because it speaks to the possible affects of poor data quality. The post pulls out the following quote (emphasis added by me):

The federal report found that barely half of Georgia’s high schools offered geometry; just 66 percent offered Algebra I.

Those data are just plain wrong, said Matt Cardoza, a spokesman for the Department of Education. The state requires Algebra I, geometry and Algebra II for graduation, so all high schools have to offer the content — but they typically integrate the material into courses titled Math 1, 2, 3 and 4, Cardoza said. He surmised that some districts checked “no” on the survey because their course titles didn’t match the federal labels, even if the content did.

“It’s the name issue,” Cardoza said. “I think schools just didn’t know what to say.”

Ah, data quality has reared its ugly head...and now everyone is freaking out about the perceived inequity of math offerings.

So, here's the deal. Suppose you're a working at a high school. You offer an algebra class...but you might not call it Algebra I. You might call it just plain Algebra. Or Algebraic Thinking. Heck, you might even have Advanced Algebra or Honors Algebra or 9th Grade Algebra. At the school level, this distinction doesn't really matter. The school has a master schedule, assigns highly qualified teachers to whatever sections it has that they identify as math. When a new student shows up and needs a math credit, everything in the student information system enables the placement.

But that isn't the end of the story. There's another layer of data that few---maybe just the registrar or district data manager---will ever see. There's a whole taxonomy of course codes determined by the National Center for Education Statistics. These course codes are collected by the state and are part of the district student information system. But because the district doesn't use them for anything---remember, they have their own labels---not many pay attention to what fills those fields.

Here is one example (click to embiggen):

These are the math classes for Bellevue High School for the 2012 - 13 school year. (data source here). Columns 4 - 6 include state course labels---the invisible ones---and 7 and 8 are designated for the district. So, let's dig into the last row ("Mathematics-Other") and see what the district is lumping in there.

Notice that in the second column from the right---District Course Title---we have things like Alg I Seminar, Gmtry Seminar, G-Alg 1 Seminar. We can't see the syllabi for these classes, but it's likely that algebra and geometry concepts are being taught. Kids are getting math credits and are being scheduled into math classes, but a data pull at a state or federal level will never see these.

It gets worse. Start digging through "miscellaneous" categories, and you start to see things like this:

The state course code on the left says English Language and Literature-Other...and the district has assigned biology, chemistry, physics, nanotechnology, and more to this category. Even assuming these are courses for English language learners, special education students, or other population, it's still science content---it doesn't belong in English. At this level, data quality is a real mess.

But what to do? After all, it doesn't make a difference to the district. They have their own codes and credit systems. It does make a difference to anyone outside of that system. It's public data. Anyone can use it for any reason---from bureaucrats trying to make decisions about allocations to think-tanks sounding the alarm about equity.

All of this mumbo jumbo doesn't mean that schools with larger minority populations aren't being underserved. Considering the other ways we shortchange these students, it doesn't seem unlikely that access to a rigorous curriculum would be on the list. I suspect, however, that noise in the data quality is hiding the true signal.

I tell teachers all the time that paying attention to data quality is the simplest way to have a direct effect on policy. You might not think that attendance you took in first period matters...but it does. As it rolls up, districts will make decisions about how they make resources...states will consider policy. How many absences before a student should be considered "at-risk"? What strategies work best to improve attendance rates? What should be the legal consequences for students or parents when kids don't attend school? All the little pieces of data matter. If you want better policy, we need better data quality.

Monday, March 24, 2014

Lookin' for Data Love

I recently attended (and presented) at the 69th annual ASCD conference. If you're unfamiliar with the acronym, ASCD is one of the largest and oldest professional organizations for educators. The letters used to stand for the Association for Supervision and Curriculum Development, but that definition was scuttled a few years ago when the organization outgrew its original boundaries. The acronym was kept for branding purposes.

You might remember that I went hunting last year for data tools in the exhibit hall. I did so again, along with attending a couple of presentations on how data is being used in schools. So, here's the wrap-up.

I looked at three different tools in the exhibit hall. None were quite to the level of a student information system, but all integrated assessment and performance data. Two are not worthy of further discussion (one, in fact, admitted that they do no testing/accommodations for accessibility).

A third, Schoolzilla, didn't totally blow me away; however, they are using Tableau to build their reports. The reports follow the Shneiderman mantra of "Overview first...then zoom and filter...details on demand." To be fair, I don't expect any product to knock my socks off in an exhibit hall setting and where I spend <5 minutes at a booth. However, Schoolzilla may be worth a more in-depth look, if your district is on the hunt for that sort of thing.

I also spent a chunk of time at another venue chatting with a rep from SchoolCity. Their product is recently undergoing a complete redesign, but I got a behind the scenes peek at things. I suspect that this product may well be worth a second look in the coming months.

Again, my socks remained firmly on my legs. Okay, so they were imaginary socks---the conference was in Los Angeles and it was too warm to wear such things---but let's just go with the metaphor here. The presentations I attended were focused on sharing how a particular school or program was using data. The common thread among all these was that no one starts with a question---and I find this worrisome.

Tweet by Science_Goddess: Data use that doesn't start with a question worries me.

While it's good practice for student assessment to guide the next steps in teacher instruction, it is impossible to use every single piece of data derived in the classroom. We have to focus---we need to be picky about where we dig. I know it isn't as simple as that. The hardest part of any analysis is that very first step: Asking a good question---the one most worth asking.

Switching it up for final session for today. "Grab a shovel: Data digs to drive student achievement." (Don't shovel and drive, kids.)

I was pleased to hear one presenter talk about how too many teachers see the purpose of data as sorting and selecting. I became worried, however, when she mentioned how "all the data is spewed across the wall in the War Room." My colleague often says that we have to get beyond admiring the problem. Data can be used in strategic ways, to be sure, but that means respecting what we collect and being thoughtful about how we move forward with it. There is something troublesome, for me, in any terminology that involves spewing and war.

As for me, my presentation went well enough---I even ran short, although no one complained about getting away early. :)  The next morning, this happened (well, after the earthquake):

Tweet by @science_goddess: Just got asked "Aren't you the data woman?" Why, yes...yes, I am."

This data woman hopes to see you at the 70th annual ASCD conference in Houston next year!

Friday, March 14, 2014

Data Sharing for Good

On Sunday, I'm presenting a session at the ASCD annual conference on using data in the classroom. Along the way, we'll take a tour of chart selection for a data set, some best practices for data viz, and tools for moving beyond "admiring the data." Materials for the session, along with other sundry data viz resources are here.

I've tried to pull from a variety of sources...a few favourite quotes sprinkled in, like this one:

We'll play a little game with the When Excel Gives You Lemons data. You can play, too, by clicking here to choose a bachelor from the three below. (Turn your speakers up for the full effect.)

We'll explore a few different kinds of stories...and the problem with pie charts.

A detour to Sesame Street will help us think about pre-attentive attributes.

So, if you're in Los Angeles this weekend, stop by and learn with us. If you have ideas and resources to share, leave them in the comments.

Monday, January 13, 2014

KO Your Data with DQ

The decisions you make using data are only as good as the data themselves. When you look at a score in your gradebook, review a report from the state about test scores, or access a national database with discipline data, you're depending on quality data.

There are lots of things that go into building data quality (DQ). Some aspects are more technical---does the student's name use alphabetic characters (no numbers or symbols)? Others are more utilitarian---is the student's name in correct column? Is it spelled correctly? Is "John Smith" the same as "Johnathan Smith" or "Johnny Smith" or "J. Smith"? And some attributes are more about what the data represents---is the score for John Smith really his (and not some other student's)? Is the score in the box from the most recent math assessment (and not the reading assessment)?

Data quality isn't just one person's responsibility. It's everyone's job, whether you're collecting, entering, or using data. It's also a far bigger issue than I can cover in one post. If you're looking for additional resources, check out the Data Quality Campaign. I also advise you to develop your own bullshit detector, because a lot of what going into using data sets is questioning whether or not they make sense. For example, if one school is reporting 60 suspensions a year...and another school in the same district that is the same size is reporting only might want to ask a few questions.

For today, however, let's talk about a few things you can do in Excel to help with the technical components of data quality.

Excel has several options for cell formatting, including all sorts of customizations you can design. Sometimes, it even tries to be helpful by deciding what format you meant. Perhaps you, too, have been the victim of adding data like 9 - 12 only to have Excel automagically change it to 12-Sep. And when you change the format to specify Text, it further mangles your original entry to 41894 so you have to start all over again. Things like this show you how easy it is for data quality errors to creep into your workbook.

Other times, you may inherit or download a workbook where data quality has not been a priority. At this point, looking for data quality issues becomes a lot like data cleaning.

Now is the time to whip out a few of those functions that you always wondered why they were there: ISTEXT, ISNUMBER, ISNONTEXT, ISLOGICAL, ISBLANK. I know, these seem like "duh" formulas---why do you need a function to tell you whether or not there is text in the cell? Because like our sample above and our list of options, looks can be deceiving.

Here are a couple of examples of how they work. For a full list of IS functions, definitions, and examples, visit this page. Using a formula like ISTEXT or ISNUMBER returns a value of TRUE (Yes, the information in the cell is text/number.) or FALSE (No, the data in the cell isn't text/number.) They are written like this (evaluating whether "Excel Rocks" is text):

And look like this after you hit Enter:

Here's how things look for a FALSE statement using ISNUMBER:

And after:

But let's face it, you're not likely to create an an entirely new table of true/false values and then identify things from there. It's more likely that you would combine these with conditional formatting to highlight the types of data you have.

If you have a data table like this one (data source here):

You can combine these simple statements with some conditional formatting to get a quick visual for your data and easily spot problems.

I've applied two rules to the section of the table with the data. The first identifies numbers and the second text. I chose some fill for the cells, but you could just as easily pick something else.

Remember, Excel can be conflicted about things. For example, the cells with the dashes are formatted as numbers, but Excel is still reading them as text. This is why it's always important to do a quick check for data quality issues either before you submit or before you use a data set.

Depending upon your role, you have some decisions to make. We have cells in our data table that are blue---meaning that they contain text. If these were data I was going to upload to a database or otherwise share, then I need to address these DQ issues. If I'm the database administrator, I might wonder why these cells aren't calculated like the others and need to investigate some code. If these are data that I am going to use for making decisions about where to direct resources for next year, then I'm going to need to find additional information.

Keep in mind that Excel functions a bit like a spell checker in these situations. It can tell you if the right sort of data is in the right spot---but it can't tell you if the names and numbers are the correct ones for those cells. You'll need additional validations for that...or a really good eye and careful work.

Data quality starts with you---from how you enter attendance, label your gradebook, or even type a student's name. But you can use Excel to help you along the way. Building a culture in your school that includes DQ at all stages can lead to better decisions and outcomes.

Tuesday, January 7, 2014

What's the Story, Morning Glory?

The tagline for this blog is Use data to tell your story. In fact, that is also the title of a few presentations I've been doing in 2013 (and am scheduled for in 2014). Whether you use Excel, R stats, Google Apps, or other tools, finding the story in a data set is a complex set of circumstances. Sometimes, you feel like you are trying to divine meaning from a set of numeric entrails. It's often difficult to know where to begin...or, when you're finally at the end, to feel 100% confident that you have honored the best story.

It's important that you be the storyteller. Yes, I know, many of you work in districts where some sort of software will spit out reports for you. Maybe it's your gradebook or benchmark testing. Those reports can be very convenient, but I would like to remind you that they are a developer's idea of what you need---they may not provide the level of insight (or quality of visualization) that supports your work. I know they save time, but with so much riding on the decisions made from these, shouldn't they be all that they can be? Don't settle for this junk. Own your story.

Exhibit OMG: Typical DIBELS Report

But where to start? Whether it's the scores from your gradebook, the annual data dump from state testing, monthly fiscal updates from the business office, or a download from a benchmark testing site, we're all faced with the same question: Now what?

Here's the deal. Working with data is messy. I'm not going to give you any hard and fast rules here ("When you have x data, always do y!"), because frankly, they don't exist. However, we can look at some workflow ideas that will build your capacity to interact with your data. I promise that you won't have to reinvent the wheel everytime. If you know you're going to get DIBELS data every two weeks...and report card scores every six...take the time to figure out your strategies once and then apply them consistently.

So, let's start with the same data set as the previous two posts. It has both categorical (ethnicity; staff and student) and longitudinal (2004 - 2013) data. You may work with a lot of data that has similar features. For example, a gradebook has both categories and time-bound information. So does a fiscal spreadsheet a school business manager might use. You can download the workbook for this post here.

Clean the Data
This idea needs a separate post, but for now, let's just acknowledge that you often get stuck with "dirty data" that you will have to spank into shape. Are the numbers formatted as numbers? Do you have missing data points...and is that okay? Is it organized and labeled, ready for use in charts? What about data quality---are the data valid and accurate? (This last idea is also another post unto itself, which we will save for later.)

Explore the Data
When you have a new-to-you data set, and it's all tidied up and ready for church, make a few different charts. Remember, the reason we make charts and graphs, instead of only using tables, is because we are reducing the cognitive demand while increasing the amount of understanding. By this, I mean that it would be very difficult, at best, to keep all the numbers in the table in your head while simultaneously interpreting the results. A picture really is worth 1000 numbers. 

Which ones, you ask? Do you start with a line chart? A column/bar chart? A scatter plot? You may have to try more than one. I always recommend the Chart Chooser as a starting point.

It's not the only tool out there to help you think through what you might want to look at. You might also like the Visualization Options over at Many Eyes. Or the Classification of Chart Types over at the Excel Charts blog. If you really want a deep dive, bigger ideas than just the charts themselves, read Resonate by Nancy Duarte (it's free!). Find something that helps you think through what you want to show (e.g., part-to-whole, relationships...).

Get in there and make a couple of pictures. At this stage, it's okay if they're ugly---they won't be your final products. You just need to see what story to pull out.

We did this in previous posts. We looked at a column/bar chart using these data and a line chart. They're totes ugly. However, they do show us a couple of things. First of all, we can see that only one population of students has had consistent growth over time (Hispanic), another has had a significant decrease (White), and the rest have remained about the same. For staff, there is not as big of a story, but it echoes what we see with the student data. The number of Hispanic teachers is increasing (3.5 to 4.9%), the number of White staff is decreasing (90.4 to 87.6%), while the rest remain steady.

So, now we have a better idea. We need something that highlights the two changes (or at least one of them for further discussion).

Tell the Tale, Nightingale
A lot of people think making a basic chart is the end. Even if you see what you need to see, be sure to clean them up. Beyond that, extend your thinking about the best way to show the data.

What about a "win-loss" chart for these data?

This is just a basic column/bar chart in Excel, except I've had it plot the overall change, instead of year-by-year. There may be times when we care about the data for in-between years---when we're trying to spot patterns in the fluctuations among groups. But perhaps it's better to just cut to the chase using a chart like this one. The drawback to a chart like this is that it doesn't give you a perspective on the proportions of each population as a part of the whole. Sure, white kids aren't as numerous as before, but what we can't see is that they're still close to 60% of the total.

Or, what do we notice if we plot the data as small multiples?

This type of chart is a combo of several charts, allowing you to make comparisons among groups. (Jon Schwabish has a great tutorial for creating small multiples using Excel.) The big thing here is to keep the axes among all the charts the same and to line up your charts so it is easy to compare across groups. I could have also done this version as a column/bar chart or even separated students from staff. These charts could even be reduced further to sparkline form, and we'd still get the idea. I always find small multiples to be a very busy way to present data, but they do allow you to spot patterns---and, especially, common patterns---much more easily. For example, in every case except White, there is a greater percentage of students than staff for a particular ethnicity.

Which one is the "right" one? No hard and fast answers here. Like any story you tell, you need to consider your audience and purpose. Sometimes, the audience is just you, the teacher, trying to decide where you need to go with your instruction tomorrow. Other times, you are trying to build a case for a school board to allot money for a capital project. Or influence policy. But the students we serve deserve the best stories we can share. Take the time to develop the best one you can.

Bonus Round
Did you know about HelpMeViz? It's a place where you can both give and receive support for data visualization. Go have a look, offer your ideas, or seek feedback on a project of your own.

You might also enjoy accidental aRt, a tumblr devoted to visualizations that turned out a little more interesting than anticipated.