Friday, March 28, 2014

When Good Data Go Bad

Not that long ago, we talked a bit about data quality---attributes of your data that describe its "truth." A recent post over at Education by the Numbers caught my eye, because it speaks to the possible affects of poor data quality. The post pulls out the following quote (emphasis added by me):

The federal report found that barely half of Georgia’s high schools offered geometry; just 66 percent offered Algebra I.

Those data are just plain wrong, said Matt Cardoza, a spokesman for the Department of Education. The state requires Algebra I, geometry and Algebra II for graduation, so all high schools have to offer the content — but they typically integrate the material into courses titled Math 1, 2, 3 and 4, Cardoza said. He surmised that some districts checked “no” on the survey because their course titles didn’t match the federal labels, even if the content did.

“It’s the name issue,” Cardoza said. “I think schools just didn’t know what to say.”

Ah, data quality has reared its ugly head...and now everyone is freaking out about the perceived inequity of math offerings.

So, here's the deal. Suppose you're a working at a high school. You offer an algebra class...but you might not call it Algebra I. You might call it just plain Algebra. Or Algebraic Thinking. Heck, you might even have Advanced Algebra or Honors Algebra or 9th Grade Algebra. At the school level, this distinction doesn't really matter. The school has a master schedule, assigns highly qualified teachers to whatever sections it has that they identify as math. When a new student shows up and needs a math credit, everything in the student information system enables the placement.

But that isn't the end of the story. There's another layer of data that few---maybe just the registrar or district data manager---will ever see. There's a whole taxonomy of course codes determined by the National Center for Education Statistics. These course codes are collected by the state and are part of the district student information system. But because the district doesn't use them for anything---remember, they have their own labels---not many pay attention to what fills those fields.

Here is one example (click to embiggen):


These are the math classes for Bellevue High School for the 2012 - 13 school year. (data source here). Columns 4 - 6 include state course labels---the invisible ones---and 7 and 8 are designated for the district. So, let's dig into the last row ("Mathematics-Other") and see what the district is lumping in there.


Notice that in the second column from the right---District Course Title---we have things like Alg I Seminar, Gmtry Seminar, G-Alg 1 Seminar. We can't see the syllabi for these classes, but it's likely that algebra and geometry concepts are being taught. Kids are getting math credits and are being scheduled into math classes, but a data pull at a state or federal level will never see these.

It gets worse. Start digging through "miscellaneous" categories, and you start to see things like this:


The state course code on the left says English Language and Literature-Other...and the district has assigned biology, chemistry, physics, nanotechnology, and more to this category. Even assuming these are courses for English language learners, special education students, or other population, it's still science content---it doesn't belong in English. At this level, data quality is a real mess.

But what to do? After all, it doesn't make a difference to the district. They have their own codes and credit systems. It does make a difference to anyone outside of that system. It's public data. Anyone can use it for any reason---from bureaucrats trying to make decisions about allocations to think-tanks sounding the alarm about equity.

All of this mumbo jumbo doesn't mean that schools with larger minority populations aren't being underserved. Considering the other ways we shortchange these students, it doesn't seem unlikely that access to a rigorous curriculum would be on the list. I suspect, however, that noise in the data quality is hiding the true signal.

I tell teachers all the time that paying attention to data quality is the simplest way to have a direct effect on policy. You might not think that attendance you took in first period matters...but it does. As it rolls up, districts will make decisions about how they make resources...states will consider policy. How many absences before a student should be considered "at-risk"? What strategies work best to improve attendance rates? What should be the legal consequences for students or parents when kids don't attend school? All the little pieces of data matter. If you want better policy, we need better data quality.

Monday, March 24, 2014

Lookin' for Data Love

I recently attended (and presented) at the 69th annual ASCD conference. If you're unfamiliar with the acronym, ASCD is one of the largest and oldest professional organizations for educators. The letters used to stand for the Association for Supervision and Curriculum Development, but that definition was scuttled a few years ago when the organization outgrew its original boundaries. The acronym was kept for branding purposes.

You might remember that I went hunting last year for data tools in the exhibit hall. I did so again, along with attending a couple of presentations on how data is being used in schools. So, here's the wrap-up.

Vendors
I looked at three different tools in the exhibit hall. None were quite to the level of a student information system, but all integrated assessment and performance data. Two are not worthy of further discussion (one, in fact, admitted that they do no testing/accommodations for accessibility).

A third, Schoolzilla, didn't totally blow me away; however, they are using Tableau to build their reports. The reports follow the Shneiderman mantra of "Overview first...then zoom and filter...details on demand." To be fair, I don't expect any product to knock my socks off in an exhibit hall setting and where I spend <5 minutes at a booth. However, Schoolzilla may be worth a more in-depth look, if your district is on the hunt for that sort of thing.

I also spent a chunk of time at another venue chatting with a rep from SchoolCity. Their product is recently undergoing a complete redesign, but I got a behind the scenes peek at things. I suspect that this product may well be worth a second look in the coming months.

Presentations
Again, my socks remained firmly on my legs. Okay, so they were imaginary socks---the conference was in Los Angeles and it was too warm to wear such things---but let's just go with the metaphor here. The presentations I attended were focused on sharing how a particular school or program was using data. The common thread among all these was that no one starts with a question---and I find this worrisome.

Tweet by Science_Goddess: Data use that doesn't start with a question worries me.
https://twitter.com/science_goddess/status/444960747974455296


While it's good practice for student assessment to guide the next steps in teacher instruction, it is impossible to use every single piece of data derived in the classroom. We have to focus---we need to be picky about where we dig. I know it isn't as simple as that. The hardest part of any analysis is that very first step: Asking a good question---the one most worth asking.

Switching it up for final session for today. "Grab a shovel: Data digs to drive student achievement." (Don't shovel and drive, kids.)
https://twitter.com/science_goddess/status/444955206690684928

I was pleased to hear one presenter talk about how too many teachers see the purpose of data as sorting and selecting. I became worried, however, when she mentioned how "all the data is spewed across the wall in the War Room." My colleague often says that we have to get beyond admiring the problem. Data can be used in strategic ways, to be sure, but that means respecting what we collect and being thoughtful about how we move forward with it. There is something troublesome, for me, in any terminology that involves spewing and war.

As for me, my presentation went well enough---I even ran short, although no one complained about getting away early. :)  The next morning, this happened (well, after the earthquake):

Tweet by @science_goddess: Just got asked "Aren't you the data woman?" Why, yes...yes, I am."
https://twitter.com/science_goddess/status/445559513274273792


This data woman hopes to see you at the 70th annual ASCD conference in Houston next year!

Friday, March 14, 2014

Data Sharing for Good

On Sunday, I'm presenting a session at the ASCD annual conference on using data in the classroom. Along the way, we'll take a tour of chart selection for a data set, some best practices for data viz, and tools for moving beyond "admiring the data." Materials for the session, along with other sundry data viz resources are here.


I've tried to pull from a variety of sources...a few favourite quotes sprinkled in, like this one:

We'll play a little game with the When Excel Gives You Lemons data. You can play, too, by clicking here to choose a bachelor from the three below. (Turn your speakers up for the full effect.)

We'll explore a few different kinds of stories...and the problem with pie charts.


A detour to Sesame Street will help us think about pre-attentive attributes.





So, if you're in Los Angeles this weekend, stop by and learn with us. If you have ideas and resources to share, leave them in the comments.