Excel for Educators: visualization

Showing posts with label visualization. Show all posts

Saturday, March 28, 2020

The Next Wave

In February and early March, I had the opportunity to teach my first college-level course. My audience was student teachers and the topic was effective data use.

I was both excited and a little terrified. It was a great opportunity to share some learning on an important topic that doesn't often get much attention as teachers prepare to join the workforce. But, it was designated as a 10-hour elective, so I made myself think of it as an extended workshop. A few years ago, I did a Data Academy in my district that consisted of six 90-minute sessions...so it didn't feel like too big of a stretch to do five 2-hour sessions. Since the Data Academy, I have also developed and presented a lot of other content at various conferences and in other spaces. The twist on all of this was that I needed to target an entirely new audience.

Syllabus
I used a framework from this article to organize the course: Mandinach, E. B., & Gummer, E. S. (2016). What does it mean for teachers to be data literate: Laying out the skills, knowledge, and dispositions. Teaching and Teacher Education, 60, 366-376. During the first session, I had the students identify their level of familiarity with each of the five major areas of the framework, along with the items under each that they'd most like to learn. This provided me with some direction for the rest of the sessions.

You can access my course materials here: https://github.com/tlricherson/TESC_MIT_Data.

How it all went down
I didn't do much in the way of assignments. I asked them to read two research articles and we used two examples from Observe, Collect, Draw to provide them with some practice collecting and representing data. I really just wanted them to immerse themselves in the discussions during our time together. It was, after all, an elective...and they were plenty stressed out with other coursework. I wanted them to have a positive experience with data.

We also did some activities modified from the Data Therapy project. These included the paper spreadsheet, find a story, and build a sculpture. I had been wanting to try some of these for awhile and hadn't had a group for this. My favourite was the sculpture activity. I've had the coloured blocks and tiles for a couple of years and it was great to provide them, along with some data, and have small groups of students physically model a story they thought was important to tell from the data. I didn't allow them to write or annotate until the very end. (We did three rounds of data/revision.) I really enjoyed hearing their thoughts about that experience. Many of them noticed the same things that I have about manipulating data physically with your hands. It's such a powerful and personal experience. Magical.

Lessons learned
I think I did an okay job. Most of the time, I paced things well and brought the right resources. As always, I should have aimed for "Less me, more them." I would have liked to structure the discussions a bit better and given them more time to practice using data when we were together. We didn't have access to a computer lab and not everyone had a laptop, so I didn't spend any time on Excel basics, which I think would have been one of the most useful things to send them out the door with. But, I did give them some rich opportunities to think with data...and that will serve them well.

I heard from a professor in the program that students talked about my class a lot in their other classes...that they were showing her lots of great pictures of their work...and that a few had said it was the highlight of the winter quarter for them. The students themselves gave me a wonderful thank you card, a $50 gift card, and a little notebook to use for capturing data. Very sweet of them!

I don't know if I'll get the opportunity to teach the class again. This particular format was brand new for the university and they will need to evaluate its success before making decisions about next year. I am so grateful for the opportunity...and also that I was able to finish the class just moments before the transition to all classes being online. I greatly enjoyed being with these almost-beginning teachers. The next wave coming to our classrooms is going to be amazing.

I was taking a break from presenting this year. And then I got asked to share some of my data stories at a conference session in December...and asked to develop this university course...and a couple of other things, too. I said "Yes" to all of them, even if they were daunting. I am glad that as the world becomes more closed for the next year or so that I had the opportunity to reach out in those ways. I have to believe that there will be chances to do so again. I hope all of you are finding ways to share and connect and celebrate learning from wherever you are holding in place.

Friday, April 20, 2018

Seven: Care to Comment?

What do we say about our students? Do our values align with the words we use? Do they reflect what parents think is important about what happens in the classroom?

In this data story, we take a closer look at 3,694 comments written about 2,862 K - 5 students on their winter report cards.

The Data
Similar to last time, there wasn't anything especially fancy in terms of getting the data. We have reports in our student information system that will gather the information and spit it out into a spreadsheet. After that, I added student demographics and program information from another student file using trusty old INDEX/MATCH.

The big challenge part was getting the data clean...or, at least, cleanish. You see, I didn't want student names. Why not? In part because I wanted to make some of the data available to others. This means I needed to strip out identifying information from the text. Also, the names interfered with some of the frequency counting and comparisons I wanted to make. (Aside: Do you realize how many kids are go by the name "Maddie"? I didn't.)

I did a first pass using the SUBSTITUTE function in Excel. I had Excel replace any occurrences of a student's first name with "" to blank it out. However, this only worked when a teacher used the actual name of the student. Many kids go by nicknames, shortened versions of their names, first and middle names, etc. I'm sure there must be better ways than looking through things row by row, but that's what I ended up doing.

The Analysis
After the spreadsheets were all cleaned up and ready for church, I looked at some different options for doing the text analysis. I don't have any real experience with this, and while I looked as some fancy options like Overview, KH Coder, and Emosaic, I just didn't have the time to devote to digging into them right now. Instead, I used the WordCounter and SameDiff options over at DataBasic.io.

The WordCounter provided the basis for the word cloud you see in the picture at the top of this page. I used SameDiff to compare lists of comments for male and female students, for example.

There are also comparisons for students who receive special services (vs. those who don't), students eligible for free/reduced lunch (vs. those who aren't), and students of colour (vs. white).

I also used a couple of pivot tables in Excel to summarize and sort through the data—for example, the total number of comments per grade level or per student population.

The Build
Compared to the last few data stories we've built out in the hallway, this one is less complicated. There's a lot of paper and stickers, with some foam to help provide dimension to the word cloud.

I knew I wanted the background to be yellow...something bright for spring, but neutral enough that the black lettering could pop. We put the word cloud in the center of the board. It has the 50 most commonly used words. On the outside, we have the four pairs of lists with words that are only found in comments for students in a particular group. The list for our students who receive special services is particularly depressing.

But wait, there's more...

This is our first data story which uses two boards. On the second board, we have information for our students in secondary grades (6 - 12). There are two middle schools and two high schools. Teachers have a list of "canned" comments at each school that they can assign—two per class per grading period—as opposed to the freeform comments elementary teachers create. For these students, we did some simple counts of how many comments per student and then underneath those charts are lists of the most common and least common comments selected. On the right of the board, we have an area for people to leave comments for us.

This second board isn't as sexy as the one for elementary, but I'm still excited that we have represented something for every school and every K - 12 student (even if they received no comments).

Lessons Learned
This is one project where I would have loved to have rejected the null hypothesis: the idea that there isn't any difference between student groups. But even with this very basic analysis, I couldn't. Even though most of the text is pretty much the same across student groups at the elementary level, the bottom line is there are some differences in how we talk about boys and girls...and for students of colour...and students from low-income backgrounds...and those who receive special services.

We may never eliminate bias, but if we don't bring it to light, we can't start to address it. While it's great that our district is taking on several initiatives around inclusion and cultural competency, but these are useless if we only use them to pat ourselves on the back for starting them. If we can't change the system in meaningful ways for students, we are just as complicit as those who built the structures in the first place. This display is one way to raise some awareness of what we're up against.

To see more pictures of this project, or view frequency tables of the comments, please visit the page for this data story. As always, comments welcome!

Wednesday, August 2, 2017

Four: The Class of 2018

This time around, we are not looking at one story, but 605 of them. This untidy display shows the pathway all of our incoming seniors are taking to graduation. Will we put a nice little bow on their K - 12 career...or will we leave them hanging this year?

The display is a Sankey diagram made of nails, mason line, foam board, ribbon, paper, and an untold number of swear words. Starting on the left side of the photo you see above, there are four blocks of colour, each one representing one of the high schools in our district. Each block is labeled with the name of the high school and the number of students/strands leading off to the right. The strands then weave through three different graduation requirements. First is passing the state math test, then the state English test, and finally being on track for earning at least 22 credits by the end of this year. Students who have met a particular requirement are grouped at the top...and those who have missed one or more requirements run along the bottom of the display.

I just put the story up on Tuesday, and already I've had a lot of questions and interest in it...more than any other story (so far). It is a bit of a mess, I realize, but so is the process of learning and making one's way through high school. And if you've tried to make 600+ individual strands behave, you'd probably agree that I've managed to do a pretty good job.

Inspiration

Honestly, I hadn't planned on this particular story. I've been thinking about one related to transportation for months...but my Muse wasn't having it. And then I got pissed off. You see, our state legislature and superintendent of public instruction worked out a deal to weaken graduation requirements...starting with those pesky tests that some students struggle to pass (along with the alternatives). But kids in our district don't finish high school mainly because of credit issues. Among last year's senior class, only 3 were in danger of not graduating because they hadn't met the state standards. I can't change the law, but I can call some attention to the facts.

I pulled the data for our incoming seniors and the numbers reflected what I'd noticed last year. More kids have met the math requirement than any other. We hear all the time how students struggle in math, but these kids are all right.

I wish I understood more about how my Muse works. I seem to go weeks (or even months) at a time where I can't quite capture magic in a bottle...and then I get the right idea and feel compelled to complete it. I did most of the work for this particular story while I was on vacation in late July. As I was putting this display up, I had more than one person ask where I get my ideas. I really don't know. I have a general area or purpose I want to explore...and I tag different displays that intrigue me on Twitter or elsewhere...but when it comes to how the analog/physical stories shake out, it's all just figuring it out as I go along. After the Muse makes a deposit, that is.

The Build

My dining table is five feet long, so I used it as a template for

the strands of mason line. I wound them around and then cut each end. From there, I tied two strands to 1.75" roofing nails. I chose these nails because they had a sizeable head to keep the line in place and because they were shiny. I wanted something that looked nice.

Next, I used a nail and some grid paper to punch the requisite number of holes for each school into foam board. After removing the paper, I used a hammer to place the nails in, ensuring that all the string was pulled in the same direction.

I used picture hangers on the back of each foam board that was at the top of a section and white ribbon to connect the various areas, such as the schools or the "yes/no" for meeting standards or credit requirements.

I grouped the strands for each school using the data...sorting the correct number of strands into each category. Then, I tied each group so that I could transport it easily to work.

Lessons Learned
These projects are exercises in solving one problem at a time. Some problems are related to the data. Others are engineering issues, for example, "How I am going to attach this to a bulletin board without incurring the wrath of the facilities administrator?"

As I've noted with previous stories, I have to let go of my tendency to want everything perfect. At some point, it's more important to just get something out into the world. I have a million other things I need to do (at least it feels that way), but also manage this compulsion to put out this particular data tale. I am very excited about two more ideas I have in the pipeline. My original goal was to produce ten of these. I have a ways to go, but I'm learning more with each one.

I still have to build the companion web page for this display, but can work on that over the weekend. I want to share some data related to which students are not being successful and are in danger of not graduating in June. This, plus what's on the bulletin board outside my office will make for good conversation starters as we gear up for the school year ahead.

Sunday, April 2, 2017

Backwards Bar Charts

Recently, someone shared a visualization from Periscopic about the Trump Emoto-coaster. While the subject matter itself was not of particular interest to me, I did like the presentation of it.

Strap yourselves in. Your hands must be this small to ride this ride.

The line chart at the top made me think about the rises and falls within a school year. March seems like an especially cruel month, with teachers' tempers growing short. (Just ask me about how I ended up in a conversation with a five-year old about why we need to wear pants at school.) How do attendance and discipline intertwine? And, when I looked at the horizontal bar cum sparkline plots shown above, it also made me wonder what we would see if we plotted individual classrooms over time. Maybe something like this:

Let's say there are four teachers at a particular grade level in a school. If we looked at the number of student absences and office referrals from the beginning of the year to the end of the year...what might we see?

If I was a principal, I might use something like this to either look for "hot spots" in my school that I might not know about...or monitor how well my school improvement initiatives are being implemented at the classroom level...or even to show staff for input. If I was a teacher, this might give me a general way to compare outcomes in my classroom. It might also piss me off (This just shows you that I have ALL of the bad kids!).

My challenge was how to build this. At its most basic level, this is a floating bar chart. And Ann Emery has a great tutorial for doing just that in Excel. But I didn't take that particular route this time because of how I need these charts to lay out. You see, absences for any given classroom total no more than 70 in a month...but referrals are no more than 13. Excel isn't going to let me push the edge of the chart off the lefthand side of the worksheet if I keep the x-axis the same on both sides, meaning I ended up with a ton of blank space. I suppose I could put attendance on the left and discipline on the right, but hey, what's Excel without some challenges?

So, how do you build a backwards bar chart?

Create your horizontal bar chart the usual way, then fuss a little bit with the axis settings.

Once you do this, then remove the gridlines and axes themselves, you'll be able to position this bar smackdab against the other one. You know it's worth it...you can work it. Just put that chart down, flip it, and reverse it.

Holla!

Another to know about this chart is the addition of the line down the middle. Since I deleted the gridlines and axes, I need some sort of visual between the bars. So, a simple line shape in grey 1.5 pt is all that was added.

In terms of labels, I'm going to leave them off. If you understand how one is laid out, then you can understand a whole school's worth. The numbers themselves aren't the big idea with this visual. It's the patterns and comparisons we're after. When we've identified those, we're ready to ask some deeper questions and dig into the numbers in a different way. These charts are the starting point for conversations...not the end...even if that seems a little backwards.

Tuesday, March 28, 2017

Make the connection: Student growth to teacher action

I have had the privilege of presenting at the ASCD annual conference over several years. I've been an ASCD member nearly my whole career. It's an organization that, as the rebranded conference name suggests, empowers educators in all roles to support students.

This year, I am presenting on the qualitative side of data. My session description is "If 'not everything that counts can be counted,' as Albert Einstein suggested, then how do we measure and represent student growth beyond test scores and grades? In this interactive session, you will learn strategies that capture student learning in multiple ways, as well as how to communicate feedback about the whole child using data visualization. Join the conversation about how to apply digital and analog tools to tell your students' stories and report the full spectrum of student learning."

The challenge of doing a presentation like this is that I have to submit the description more than six months before the conference. Whatever it is that I had in mind to talk about in August was long gone before I received notification that the proposal was accepted...let alone when I sat down to build the content. I am influenced, too, by all the things I have learned in the interim.

The basic story arc did finally emerge. I'll start first by talking a bit about why data visualization can be a powerful tool. This is my usual lead-in, and I think it helps to provide a few easy to grasp examples before launching into new territory. The next hook is to talk about achievement data. Now, this particular piece does not explicitly fit the session description, but my goal is to move from the larger scope of the purpose of data viz to what we typically see in education, and finally into non-traditional ways to represent education data...and perhaps even a little further than that.

I heard a presenter this morning say that "schools embrace business ideas as they are fading." In other words, what was hot in the private sector 5 or 10 years ago becomes the things that schools are talking about now. I have seen this happen a lot over the course of my career. And what worries me most now is that decisions about data privacy and access are being made now that will affect schools in ways they haven't even anticipated yet. I am not going to claim that I can change the world with my presentation and suddenly schools will make these conversations a priority...but it's a start.

My call to action for them is around being in control of creating their own narratives using data and to think about what they want to represent, not necessarily what they are told to represent. All too often, the public view of school data is just annual test scores. But children are so much more than the sum of their test scores. They deserve a more robust approach to sharing their stories (and to be involved in that process, as well).

I have an ancient (by web standards) wiki where I have placed materials for this session. Someday, I'll move everything over to GitHub...but for now, it's a reminder of the journey I've taken to this point and perhaps a place to shape the ideas ahead.

Saturday, June 11, 2016

Eyeo 2016 Recap

I attended the Eyeo Festival this week. It brings together "creative coders, data designers, artists, and attendees." I have been wanting to go for a couple of years as a way to pull myself in a different direction. It's easy to get into a rut, or at least into a routine that doesn't allow you to ponder other possibilities. This was a very different conference from others that I've attended. Here, I came away feeling creative and inspired. At others, I've walked away with learning to apply. It's not that one outcome is better than another---they each have a role.

I am looking forward to the videos from the festival being posted. In the meantime, here are some of the highlights.

Emergence
Nicky Case kicked things off. His focus was on emergence, a concept where the sum is different from the parts. I can't say that he shared anything new in terms of his ideas, but what I liked was seeing a young adult share his process of learning that there is a lot of grey area in the world. I worked with teenagers for nearly 20 years, and the black/white worldview was pretty normal. It takes time and experience to learn that there are lots of answers to any question. As a young 20-something, Case is showcasing the transition to a more experienced lens on the world.

The keynote by Paola Antonelli, which was the next evening, shared an even more advanced take on this theme with her views on quantum design: "ambiguous states, in the spaces ‘in between’—between digital and physical, high-tech and crafts, old and new, nature and artifice, developed and emerging world." Beauty is not just in the eye of the beholder, meaning is derived from the eye of the observer.

Transformation
A second theme was about the transformative nature of data. Paolo Ciuccarelli of Density Design spoke about the poetics of data visualization. He pointed to the need to design data experiences that "generate poesis within a space of wonder."

from http://www.densitydesign.org/research/brain-houses-three-interieurs-stories/

One of my favourite ideas that he shared was this concept of a panorama, like the one shown above. I love the idea of embedding the data within a larger context. I highly recommend having a look at the Raw tool for generating visualizations.

Moritz Stefaner gave a talk on his Data Cuisine project. One of the things I liked most about this project was the idea that the dimensions of food (ingredients, presentation, cooking method, etc.) can be used to represent dimensions of data. This leads to a very different sort of interactive experience.

Transformation also appeared in how artists used materials in different ways. Whether it was Anouk Wipprecht combining her love of couture and robots or Tania Candiani speaking about the intersection of combination, serendipity and translation, I was blown away by the creative thought processes that were shared.

This is not the sort of end product I get at education conferences---where sharing one's thinking is not considered good enough. At those conferences, there is an expectation of audience involvement and tangible takeaways. With Eyeo, the feeling that is created through the presentation is the goal. I can't talk about this conference in terms what I learned, but rather, how it made me feel. This brings me to the last major theme.

Instruments of Power
There was a strong focus on equity at this conference, from the range of speakers, to topics, to the code of conduct. Part of that is an understanding of privilege as it applies to how we collect, use, and represent data.

Marek Tuszynski from the Tactical Technology Collective shared their recent exhibition: The View from the White Room. (E.g. looking out from an Apple store.) The show looked at questions such as What does it mean to live in a quantified society? and What is the value of data privacy when it becomes something you can buy? Lots of powerful things to think about from this session---I had to get out and take a walk after it. Part of the exhibit included something called Big Mama, based on the quote from a government official justifying surveillance that he did it because "I love you all." and the perception that the contribution of data leads to a harmonious society. Take a deeper look at Unfit-bits, Me and My Shadow, Security in a Box, and Exposing the Invisible. It is not that these concepts are new or unknown, but it's their application within our personal and professional contexts that make them worth revisiting.

As much as we talk about the power data visualization has to reveal, we rarely talk about how it can also be used to hide. In the best talk I saw, Josh Begley shaped conversation around what the work is that data visualization does. In one example, he talked about the geography of incarceration. As part of that, he made the comment that "most photos today are taken by machines for other machines to see." Satellites, drones, and other tools capture far more images than anything humans post to Instagram, Flickr, or other sites. Josh works on projects that bridge what machines are doing with what we notice. Do we want to be as connected to our foreign policy as we are to our phones? Check out his work on the Dronestream App or Officer Involved Shootings as ways to explore how the things we don't represent are still powerful enough to evoke emotion.

There were other presentations, keynotes, and sessions that I attended. There were also some I didn't get to attend due to having to get to the airport...including one I was looking forward to the most by Lynn Cherny. However, I enjoyed exploring a bit of Minneapolis, getting to meet several data viz heroes in person, and being able to think about some very different concepts for awhile. This spring has been a real drag in terms of work demands. I am looking forward to working on some new projects that are being spurred by this recent boost to my sense of creativity.

What have you seen recently that inspires you?

Wednesday, March 9, 2016

Go Tell It on the Mountain

I am stretching this week beyond the comfort and confines of my typical environment in P-12 public education. I'm at a convening of data storytellers from a large variety of industries. It's the first time I've been to a conference that is not specific to education.

About 100 of us are safely tucked away at the Stanley Hotel (you know, the one the inspired The Shining?) in Estes Park, Colorado, for the Tapestry Conference. As far as I know, I'm the only public ed person skulking around---although there are several higher education representatives. It is odd, for me, to meet and greet with people from Zillow, Comcast, ProPublica, or NBC News. Every face is new to me, although I finally met Robert Kosara and Naomi Robbins...both of whom I've been wanting to meet for a long time. I love that everyone is passionate about the same goal of effective storytelling with data.

I have that interest and commitment to quality communications using data. But why else am I here? After all, I am definitely in the "one of these things is not like the others" category. I am here because public education needs to connect with everyone. It's public, for crying out loud. Everyone's tax dollars are funding it. Regardless of the industry you represent, there is a connection with public education. I hear all the time from educators who are tired of how others message our work. We can change that, but not by sticking to our own circles of influence and expecting the rest of the mountain to come to us. Sometimes, we have to go to the mountain.

Beyond this idea, however, is a more personal one for me: I need to learn and grow in my professional work. I get an opportunity to do that through education conferences---they help me learn about my job. But there is something beyond that...something that speaks to the purpose of what I do and feeds my spirit for it. That is what I am hoping Tapestry will be for me. This is not about the nuts and bolts of my day-to-day job. This is about helping me inspire and grow others when I return to the office.

Learning is my life's work. It is easy to lose focus on that with a sea of emails, ever-present to do lists, and a calendar full of meetings. It is critical for me to set all of that aside for a couple of days and just immerse myself in learning. Networking with others is great, too, and a change of scenery doesn't hurt. But most of all, this is an opportunity to just be in that moment of growing my knowledge base.

We are not so different, educators. We may have small-batch, artisanal data sets and handcraft our visualizations in Excel, but we face the same challenges as Big Data when it comes to data quality, effective communication, messaging, and design. We have the same issues around helping people ask good questions of their data and identifying the most critical aspects for action and attention. Perhaps we have a better chance of finding solutions together, rather than isolating ourselves as educators. Together, we can move mountains.

Sunday, March 6, 2016

Pretty Is As Pretty Does

I am often told that my work is pretty. I always find this to be a strange comment. I have to admit I've felt a little insulted by it at times. My goal is to communicate clearly using data...not make a pretty picture. No one talks about a sentence being pretty just because there's a capital at the beginning and a punctuation mark at the end. Why should it be any different for a visual that follows some basic rules of the road?

I've been thinking about this push and pull between what the story is in our data and how the story is presented because I will be heading out to the Tapestry Conference this week. The purpose of the conference is to "advance interactive online data storytelling [by bringing] different viewpoints together with the goal of generating a rich conversation about data storytelling."

What is the role of pretty data in such a conversation?

Is it an unnecessary add-on? Could we communicate with data just as effectively without paying attention to the finer points of layout, colour, and line? After all, the data visualization is not the end goal---it's what we do with what we see in it. Meanwhile, it is possible to have aesthetic and no meaning at all. Chad Hagen illustrates this with his nonsensical infographics.

We could also point to examples that look great, have real data that tell a story, but still don't mean anything. For example, Tyler Vigen's Spurious Correlations.

At the other end of the spectrum are arguments that the art of data must be present in order to create meaning. Both Giorgia Lupi in Beautiful Reasons and Moritz Stefaner in Little Boxes make the case that form and function, as well as art and design, are integral to deep understanding of our data.

I am learning to smile and say "thank you" when people tell me the data I show them are pretty. I am learning that the meaning behind the comment is one that can refer to clarity or deep understanding. I am hoping that the audience makes enough sense of things to see that pretty is as pretty does.

Thursday, February 25, 2016

Making Sense

Earlier this week, I had an opportunity to share three sets of data with building administrators. You've had a preview of some of the visuals: a new way to represent the achievement gap, cluster charts, and small multiples. The administrators had not, although I have shared lots of data in lots of other ways with them. This was our opportunity for some in-depth work.

With each round of data presented, I did a little bit of instruction so that they understood how to read the charts. Each table had some paper copies and a few focusing questions to get the conversation rolling. And then the real fun began.

I have never had a chance to watch people learn with new-to-them data representations. Bar charts, line charts, and scatter plots are commonplace. When you share these types of visuals, everyone already knows the drill. But hand people sheets with small multiples and a few of them will overlay the pages and hold them up to the light. Give them a set of line plots showing gaps among student groups, and they will spread them across the table to organize the pieces in different patterns. Hand over some cluster charts and watch and people fold the paper along various lines to build new learning.

It was absolutely fascinating.

As a way to gauge their engagement with the new charts, I tried the Talking Mats which were recently shared by Andy Kirk.

I provided three different colors of sticky dots, one with each round of data, and asked the administrators to place their dot when we transitioned to the next part of the workshop. In addition to watching them interact with the data---which was very powerful learning for me---the Talking Mats provided valuable feedback at the end. These are visuals for me, created by the audience, which have told me a lot about which charts gave the biggest bang for the buck. I'll know where to focus my time and work in the future.

A few people struggled with looking at the forest of data represented by small multiples, instead of the trees. Some made connections across the data sets. Still others wanted to argue with data because they were unnerved by what they were seeing for the first time. By the end, though, there was a very rich conversation about the meaning we'd been able to make from all of the data.

For the first time since I started this job, I had many public compliments about the work I shared. My favourite has been "Thanks for presenting data in a way that helps me learn." Here's to more opportunities to use Excel to support us in making sense of things.

Friday, February 12, 2016

Living Large with Small Multiples

In the poem Song of Myself, Walt Whitman writes "Do I contradict myself? Very well then, I contradict myself. I am large, I contain multiples." And while Excel might not be considered to be as sexy as Whitman's prose, they do have some things in common. They both can contain multiples.

Small multiples use a series of similarly scaled charts. The purpose is to allow for easy comparison across time or groups. When you use these charts, you are looking at the forest and not necessarily the trees. You don't want to focus on details as much as you search for larger patterns to investigate.

I built the example below this week. It's a series of scatter plots. Each tiny blue dot is a student, and their positions on the charts represent the point where their percent attendance and scale score on the state assessment intersect. An orange line shows the linear regression for the data set in the chart. The line tells us a couple of things. It provides a quick visual on the range, as well as the basic trend.

What kinds of things do we notice? Maybe it's how students who score in Level 1, regardless of grade level, don't have much of a discernible pattern. Level 3 students tend to clump---their rates of attendance and scores are very similar to one another. Maybe we have a conversation about those areas where the line slopes downward. How do we explain a trend where the more you come to school, the worse your do on the assessment? Or maybe even the overall picture isn't what we might predict. Even those trend lines that have an upward slope aren't very steep. Wouldn't we think that better attendance leads to better scores? And maybe we need to talk about what's happening when kids get to sixth grade and attendance starts to get a lot more worse for students at all score levels.

Because you likely can't read the itty-bitty labels, I will confess that I have broken a cardinal rule when building this: the y-axes are scaled identically for each grade level, but not among all the grade levels. Percent attendance is plotted along the x-axis and is the same for all of the charts. But the range for scores changes. The higher the grade level, the higher the possible score. I've tried to mitigate this by keeping the y for each grade at about 400 points. If I'd had to make in the entire score range identical for all grade levels, the information represented was too squeezed to make sense of things.

There are thousands of students represented on this single graphic. While focusing on an individual is critical to the daily work of the classroom, small multiples serve a different purpose. This time, it's about the herd.

What will school principals see when I show this to them in a couple of weeks? I'm not sure. I'll have to provide a little support in learning to read it, but I think they'll catch on quickly enough. The chart will be part of a larger conversation around student performance...one piece of a puzzle where they will apply context. As for me, I've enjoyed looking at this because I see something different every time.

Are you using small multiples in your work? How have they been useful?

Bonus Round
To build this, I organized the necessary data and then used a pivot table and slicers to pull attendance and scale scores by grade and score level. Dynamic ranges were used for the charts, allowing for expansion/contraction of the number of data points.

Each chart was pasted into PowerPoint. This allowed me to size and position all of the charts and labels, as well as easily share the document.

Sunday, January 24, 2016

Ethical Communications Using Data

Much of the data we collect as educators is subject to various federal, state, or local regulations about who can see the data and for what purposes. These ethical considerations most often apply to data points that connect with an individual or very small groups. Once aggregated, we tend to slide into our own version of ethics. We make decisions about how the data are presented and annotated. We choose the stories, the focus points, and even the audiences we share with. What are the questions we should be asking ourselves as we make these choices?

A recent forum on Responsible Data Use generated some categories and avenues of inquiry around this topic. I've read through the summary several times now, and with each glance through the list, I find new things that I'd like to discuss. Here are a few that catch my eye:

Communicating uncertainty

How do we communicate uncertainty in data?
In metadata?
How do we represent gaps in the data?
What if our knowledge of the uncertainty in the data is anecdotal?
How can visuals show “no answer”?
How can data visualization promote ambiguity?

Literacy

How do we improve everyone’s data visualization literacy, as creators and as viewers?
How do we educate people about the data they create?
Which people most need data literacy?
Can we provide interactive tools that let viewers adjust data visualizations in real time as a means of improving literacy?
How can we support grassroots groups to create better data visualization?
Is there a need for basic design principles and data viz 101 resources?
How do we navigate a fear of numbers?

BAD data viz

Is meaningless data visualization worth anything?
What about when people make decisions based on bad data viz?
If raw data is unrepresentative, will visualizations on it be bad?
We should collect examples of unethical data visualization.

Audience

How do we involve the audience?
Who is the audience, and why?
How do we create community ownership of a data viz?
How do we allow a data viz to speak to multiple disparate audiences?

Some of these questions are easier to answer than others---we can think of a few ways to represent a lack of data. Others, like those in the "BAD data viz" group, are not so simple, but would be fun to kick around and see where we get. What would be your priorities in your workplace?

The summary with all of the categories and questions also has links to a variety of resources and notes connected with the forum. They are well worth exploring, if you have a few moments.

Tuesday, December 29, 2015

Show the Data: Distributions

Last spring, we looked at a couple of ways to show the data: Cluster diagrams and bump(s) charts. The idea here is that when we summarize data and represent it in bar or line charts, we miss nuance. Instead, when possible, we should look for ways to show all of the data.

I had this in mind when I recently tried to tell a story about student performance in my school district. The district does a pretty good job with about 80% of its students, and because that's far better than the state average, no one asks the hard questions about the remaining 20%. Shouldn't they learn to read and graduate from high school, too? I certainly think so...but I often run into roadblocks when I try and raise this conversation. Maybe I need a different visual to use as persuasion.

I remembered some information for another post about recent research in using distributions to replace bar charts and thought I might give it a try.

Here is Exhibit One:

This is typically what we provide to schools (and the public) regarding student performance. This chart represents one grade level at one of our elementary schools. Levels 1 and 2 (L1, L2) represent the percent of students who did not meet the standards ("pass"), while Levels 3 and 4 show the percent of stuents who met (L3) or exceeded (L4) the standards for English Language Arts. Generally speaking, this doesn't look so bad. Lots of kiddos did well enough, and L1 is the smallest category. Yea, Team!

But who are these kids in each category? Do we have the same distribution in performance if we look at student demographics? Let's find out.

Here is Exhibit Two:

A little orientation to this beast. We still have levels of performance (1, 2, 3, 4) on the x-axis, but the y-axis shows the actual scale scores. In other words, for this grade level and subject, Level One is actually made up of scores in the range of 2114 - 2366, Level Two ranges between 2367 and 2431, Level Three is represented by 2431 - 2489, and Level Four includes performance between 2490 and 2623 (source). Every child's score (n = 69) is represented by a circle on the chart.

It might be interesting in itself to just look at the distributions. But I've added some information based on student ethnicity. The grey circles that you see represent students who are white (n = 54). The pink circles represent students of color (n = 15). Overall, only one-third of scores from students of color are in Levels 3 or 4, while about two-thirds of the white student performance are in those levels. And, one-third of all students of color are in the lowest category (Level One).

If you're wondering about why I am not representing different groups (American Indian, Asian, Black, Hispanic, Pacific Islander, Two or More Races, White) with different colors...well, I can tell you that I wrestled with that decision quite a bit. Our district has very small numbers of students of different races. For example, for the school and grade represented above, there are no black students. There is one American Indian student shown on the chart (I can't tell you where, due to FERPA restrictions). This student as an individual is important and worthy of all of our best efforts. When represented by a score, conversations become problematic because there is no way to compare it with others in the same group. Disaggregation of the data at the grade and school levels does not cause the sorts of inquiry that it should because "it's just one score." Trust me---I've heard that refrain quite a bit. But when I add the "just one score" with others in a building who represent non-white students, there's a bigger argument to be made. Your mileage may vary, based on the populations you are working with. All that being said, I am very open to feedback on this. What are some other options I should consider that will balance tiny n-size against the overall story to be told? Stacked bars, perhaps?

I realize that that two charts I've shown in the post represent different things. One is just the overall percentage by category...while the other is distribution by category. So, one isn't necessarily a replacement for the other. Even if I altered things a bit by showing numbers of students in the first one, it would result in the same chart. But I think there is some real power in looking at the second chart---even if it was not coded---and understanding that every child is there. It's not a blob of summary performance...and goes beyond a simple count of who is in each box.

So, here's looking at you, kid. (Especially if you aren't white.)

Bonus Round
The distribution of performance chart shown above was built in Excel (of course). It is a basic scatter plot chart, with specific scores selected and colored either grey or pink. If you visit the research site I mentioned earlier (Beyond Bar and Line Charts), they have some workbooks you can download and easily modify.

Saturday, December 12, 2015

WERA 2015: Data Viz Workshop

I've done several presentations over the years about data visualization within public education. I've talked about graphic representations as a form of feedback, types of tools, guidelines for improving communication using visuals, and more. All have been brief 60 - 75 minute affairs with some very simple sorts of activities and conversations along the way.

This week, however, I had an opportunity to guide my first workshop. I had three and a half hours available to support educators in really digging into telling the best stories we can with our data. Having this sort of space and time available enabled me to think about the content differently. I've posted links to previous presentation materials here, and this post is no exception. For those of you who might be interested in scoping out the slides, materials, or links, head on over to ye olde dataviz wiki to take a look through the stash.

What I wanted most for the audience this time through was an opportunity for self-reflection and metacognition. Educators have relentless jobs. There is often no chance to think about what did or didn't work with a group of students today because they will be here again in the morning...and the tyranny of the urgent is to plan for that. I felt like it was important for our afternoon to be a time where people could become more aware of their own design process---no matter how simple or sophisticated it might be---and, more importantly, be inspired. I tried to bring in many different examples of current projects from a variety of fields. The best way to get out of your bar or line chart rut is to see where others have made departures.

I don't consider myself an expert in any of this. I do consider myself curious about it. I do see a significant need in our field to elevate our visual communications, as well as prepare our students to do the same. I want to continue these conversations and add what I can. I will refine my workshop materials and perhaps have another opportunity to engage in this work at another time. I enjoyed it and hope that it's the start of something bigger and better for our field.

Monday, November 9, 2015

The Agony and the Ecstasy

Am I the only one who agonizes over the best way to represent a data set? Is there a 12-step program for those of us who are occasionally paralyzed by all of the visualization options? If this sounds familiar to you, read on for my latest struggle in bringing a data story to life.

Several weeks ago, I was asked by my superintendent for information on the achievement gap. For those of you who might not know this particular piece of eduspeak, it refers to the difference in achievement levels between populations of students. For example, white students often perform better on standardized tests than black students. This difference is referred to as the achievement gap.

This request should have been a piece of cake. I have the data (scores and student demographics). I'd done a similar project last year. But I felt like the bar charts I'd used were lacking. They show differences in performance plainly enough, and yet it's difficult to capture information for various populations in a single, easy to read report.

I started as I usually do, hand drawing some possible layouts and then building a few models using the data.

These examples all show longitudinal information for males and females at a particular grade level and in a certain subject area. The particulars are not too important here. What I discovered in doing these, even before cleaning up the charts, is that none of them were satisfactory. They all showed the data accurately, but none of them captured "the gap" in a way that caused any sort of interest or reaction.

Back to the drawing board.

I realized that I needed Excel to show me the percentages on one axis, like a number line, so the space...the gap...became visible. Here is what I ended up with:

This is a bit out of context, so let me tell you a bit more about what you're looking at. This chart only shows 2015 data for one grade level. The horizontal line is the number'ish line: 0 - 100%. The vertical line shows the overall percentage of students who met the standard on a particular assessment. The placement of populations (shown by orange or blue triangles) provides a general relationship to that overall level of achievement, as well as shows the gap between the populations. I do include the actual n sizes and percentages in a table below the charts.

Here's a broader view:

I am not going to show you the data tables, due to FERPA issues---some of the subgroups shown above have fewer than 10 students. I need to stay within the bounds of federal privacy laws in this public space, but just know that they exist to provide detail to data users in our district.

I'm really happy with this layout, however. It gives, at a glance, an easily readable view of the achievement gap at a grade level. When looking at these over several grades, patterns begin to emerge. This is especially important for those groups where the n size is very small for a grade level. For example, having only one black student in a grade might not tell you much if they didn't meet the standard, but when you see that our small handful of black students at every grade level all fall well below their peers, it's alarming. It's also easy to cover up either the orange or the blue markers and get a quick picture of who is or is not successful.

While I still have the longitudinal view to consider, it's simple enough to build similar charts for a few years of data and then align them to provide a similar glance at trends.

I apologized to my superintendent for my tardiness in delivering the product, but I think the agonizing has given way to some ectasy over seeing things in a way that's clear and best represents the question to be answered.

I don't know that anyone, other than those of us struggling to represent data, understands why it takes so much time to build a report. Others don't see how many different charts we modeled...all of the colors we tried...or the variety of label placements (and label content) we viewed. They don't hear the conversations with have with people around the office to learn more about what is or isn't working for them in our draft visuals or how they want to interact with and use the information presented. But for those of you who are knee-deep in this process, I'm cheering you on from here.

Bonus Round
Like these charts? They're just scatter plots in Excel, with the vertical axis removed. Easy-peasy to make if you're on the hunt for something similar using your own data.

Sunday, August 16, 2015

Anatomy of a Design Build

Like many school districts, we have new data this fall. The state has changed to Smarter Balanced assessments to measure student knowledge and skills with Common Core State Standards. One of the challenges associated with presenting these data is the inevitable effort to compare them with scores from previous years. But they do not represent the same things. And so began the challenge to develop this report:

Subject areas are organized across the top. Grade levels are on the horizontal. The blue squares represent grades and subjects assessed in 2015. The large numbers in the center show the percent of students meeting the standard, with the distribution of scores shown by the column charts at the bottom of the square. Historical data are shown by the line graphs in the grey squares.

It is built in Excel (of course). On another sheet is a table listing the various schools, grade levels assessed, subjects, school years, and scores. A pivot table with this information feeds the report you see above.

I'm mostly happy with this layout and format. I would still like to tweak some of the colours, but overall, I think I've solved most of the issues with representing the data. But it took awhile to get this far.

In the beginning, there was a suggestion from one of my supervisors to offer something like what you see on the right. This is what another district in the area provides to schools and the public. I showed it to one of our principals and he said it made his eyes bleed. I agreed with that sentiment. We can do better, I said.

It's not that the information provided here is bad, it's that a pdf of a spreadsheet does not take advantage of what data visualization can offer to make meaning. It has much of the same information I developed for my version. We just used two different approaches.

Originally, I started with something similar. I pulled demographic and program data and added some sparklines and arrows to show change.

But I decided that this was a stupid idea before I got too far down the road with it. It was way too hard to scale the graphs along the same axes. This is an important thing to do to enable comparisons. But there is just too large of a range to represent in a small space. Not to mention this layout is boring. Seriously. Yes, I know it's half finished, but no bells and whistles involving fonts and headers are going to make this version anything other than a snoozefest.

So, I tried something else.

This is where I started to play with the idea of tiles. I wanted a cool color (as opposed to a warm one) and decided to look at purple as an option. This version is slightly better, but not by a lot. Looking at it, I realized something very important: Almost none of this information is useful. Does anyone, especially a school leader, really care about the year-by-year percentages of Asian students (for example)? It's not actionable data. You're not going to go out and recruit more Asian children if you notice your percentage slipping. There's no action plan a school would make to decrease the enrollment of girls. This is not to say that viewing overall change wouldn't yield some insights, but the year-by-year review isn't very helpful. So I started over. Again.

Third time's the charm, right?

I won't claim this is perfect, but we've come a long way, baby. All that demographic and program data? Now in one tidy bar chart on the left (under "% of overall enrollment"). The five-year change is shown on the far right. The stuff in the middle? All new. Kids are more than the sum of their test scores. So, I've included some additional demographic information about absences and discipline. Now we have some conversation starters.

I did the achievement data next. This is the page at the beginning of this post. I played around a bit with colors and format, but the tiles have been a constant.

Feedback has been mostly positive, but I'm still tweaking things. What would you want to see? How should we show it? Anything we should remove or represent differently?

Thursday, May 7, 2015

Show the Data: Cluster Charts

In the last post, we explored the idea of adding bump(s) charts to our rotation of how we communicate our data. It's one way to show all of the data in a particular set. Another one I've been using quite a bit is a cluster chart. Full disclosure here, these are my own take on displaying data---a bastardized heat map, and certainly not based on heavy-duty math like real hierarchical cluster charts. So, really, I'm not sure what to call these...but in my current role, we're finding them to be very useful and I'm just rolling with cluster charts as my category.

This spreadsheet will eat your soul.

I get a lot of spreadsheets sent to me that look like the one on the right. I hate these with a fiery passion for a variety of reasons:

Too much "ink" in the data-to-ink ratio. With all of those little boxes, I don't know where to look.
And the colors. I feel like a circus came to town. But beyond that, the red and green are not particularly friendly to those with color vision issues...and I do work with some who are color blind. Are we really asking them to try and make decisions on student learning based on this?
Not to mention that all of the data is colored in. What's the point?
And we have both numbers and colors. I'm not saying that you can't have both...or that they don't serve different purposes...but it's distracting. I'm constantly trying to make sense of the number patterns for each color.

I also think these data aren't useful because of the way they are organized. Alphabetical order is great for gradebooks, but not so much for trying to make sense of the data. Plus, we don't have any context---what if we're missing some signal in the noise? Suppose all of our low-performing students are boys...or in a minority group?

But let's say you are interested in showing both the progress students have made over time, as well as the characteristics of the students involved. We can reorganize the data by ranking the percentages on the second assessment (this is the "cluster" part). Then we can color code some additional information, such as gender or participation in a particular program. I also change the properties of the conditional formatting so that the fill and text are the same color, making the values seem invisible. Finally, I add thick white borders around all of the cells and resize the rows and columns. Here is a small part of the final product:

These are all the students who scored in the top level of our fictional grade 5 winter math assessment. Three of them improved a little, from light blue to a darker blue...others improved from lower on the scale (orange). But when we look at gender and program, another story emerges. Most of the students in the top category are female, not on free or reduced lunch, not in special education, and do not receive additional interventions through a Title I program.

See the difference when we look at students who have scored at the bottom for both fall and winter? Our population is mostly male and nearly everyone participates in one or more federal programs.

Maybe this representation doesn't necessarily hold any surprises, especially as we factor in free/reduced lunch. Children living in poverty typically do not perform as well as their peers. But one of the things I take away from this way to visualize that story is that we may need different interventions to support these students. Consider Student 39 on the right. He is receiving free or reduced lunch, special education and Title I services...and he's still ranked fourth from the bottom out of nearly 70 students. It doesn't mean that the school (and the student) aren't working as hard as they can. I do think it might mean that there are additional factors at work here that aren't (and can't be) addressed through the school. Perhaps the family is homeless or transient. Maybe the parents are going through a divorce...or the student has some medical issues. These are community-based issues and require different interventions to help close the gap for the student. I won't get up on my left coast soapbox about this right now. I'll just say that we have to work together on behalf of the whole child.

One of the pieces of information that is not represented in the visuals above is the number one item on teacher wishlists when it comes to reporting scores: progress. Sure, we have a bunch of students performing at the lowest level in the picture above, but that doesn't mean that they didn't make some growth.

This time around, I left the gender and program pieces coded the same, but I calculated the percent change between fall and spring and represented those in the leftmost column.

Look at Student 39 now. He's 12th from the top. Woo-hoo!

When we consider progress, we start to get a more equitable pattern---everyone is growing, and more often than not, it's our lowest performers who are making the biggest gains, even if they're still in the lowest part of the score breakdown.

By clustering similarly performing students together, either by scores or by progress, we get a much more useful pattern than we do with a spreadsheet that looks like a clown exploded on it. And, more importantly, we can show the data. In a very compact space, I can display everyone's scores and whatever demographic or program information is most relevant. And, I can fit the whole grade level on a single page.

I have no doubt that as we move forward, smarter people than me (I?) will continue to find new charts that help share everything we know about a group. Summary stats and charts will never go away---and they have their own purpose to serve. But sometimes we want the full version, not the Cliff's Notes. When we do, bump and cluster charts will be there.

Bonus Round
Want to see just how challenging colored squares can be? Play this online game. The rules are easy: just click on the one square that is different.

Monday, May 4, 2015

Show the Data: Bump Charts

This is a bump(s) chart, a/k/a slope graph. If you poke around online, you'll find a variety of examples and names. Some have multiple data points for each line...and some are simpler, like the one on the right.

Typically, the lines are labeled on each end, often with then name of the data series and sometimes with the data value. I do have a version of this one with the lines labeled, but since these represent real data points, it's best to keep things anonymous for this example.

But let me give you a little context here for what I'm showing. Each line represents a teacher---the entire chart shows the entire staff for a school. On the left is each teacher's percent of Ds and Fs assigned to students for the first semester of the 2013 - 14 school year. On the right is the value for the 2014 - 15 school year.

We want to use this chart to look at two things. First of all, what is the general trend within the school? In this case, most of the lines are sloping downward. This may connect to initiatives, such as changes in grading practices, tutorial options, or improvements in instruction. Whatever the story is behind this chart, it's looking positive.

Next, we want to consider the steepness of the slopes we observe. Sure, we could add a trendline, but if you're just using the chart for exploratory purposes, we can eyeball things. In this case, we might note that most of the downward sloping lines, especially for the upper percentages on the 2014 side of the house, have had significant decreases.

Typically, when I present these charts, I include a summary of the data. For example, between the 2014 and 2015 school years, 30 teachers assigned fewer Ds and Fs to students, 7 teachers had very little change in the percentage of Ds and Fs assigned to students, and 3 teachers showed an increase in the percentage of Ds and Fs assigned to students. Because these charts are new to many of the people in my audience, this brief summary is enough to get them oriented to the chart. They can then begin to focus on the details. This might start with the slope of the lines, but then I see them begin to dig into the labels: Are some teachers in new-to-them assignments this year? Are the lines showing little change all in one content area, such as math? What might we see next year---is there a goal around our percentages?

And now, a musical interlude...

Let's take a look at another of these beasts. This is a different school, but in the same district.

We see a lot of increases compared to the other school; but, if you look at the scale on the lefthand side, you'll see that none have a higher percentage than the other school. Generally speaking, teachers in this school assign a lower percentage of Ds and Fs vs. the other school.

The overall changes at this location aren't as dramatic, either. The slopes are more gentle.

What might account for the differences? Again, you'd have to poke further using knowledge specific to the school: Is this a more veteran staff that is has more expertise or are more resistant to change? Are the increases due to an unexpected change in student population---were the enrollment boundaries changed?

You could, with additional information, make some other comparisons between the two schools. What if you built graphs just showing one department, such as math? It would make the charts less busy and comparisons between buildings a lot easier.

The big idea with these charts, of course, is to show the data. Sure, we could just write the summary and do a simple bar chart or line chart to compare totals...but we're missing a lot of the story in doing so. When we go bumpin', we get a much richer picture of what is happening.

Next time, we'll take a look at another way to show the data using cluster charts.

Bonus Round
These charts are super-simple to make in Excel. Jon Peltier has an excellent tutorial on his web site. You can also download a template from the article I profiled in the last post.

Pages