Friday, April 20, 2018

Seven: Care to Comment?

What do we say about our students? Do our values align with the words we use? Do they reflect what parents think is important about what happens in the classroom?

In this data story, we take a closer look at 3,694 comments written about 2,862 K - 5 students on their winter report cards.

The Data
Similar to last time, there wasn't anything especially fancy in terms of getting the data. We have reports in our student information system that will gather the information and spit it out into a spreadsheet. After that, I added student demographics and program information from another student file using trusty old INDEX/MATCH.

The big challenge part was getting the data clean...or, at least, cleanish. You see, I didn't want student names. Why not? In part because I wanted to make some of the data available to others. This means I needed to strip out identifying information from the text. Also, the names interfered with some of the frequency counting and comparisons I wanted to make. (Aside: Do you realize how many kids are go by the name "Maddie"? I didn't.)

I did a first pass using the SUBSTITUTE function in Excel. I had Excel replace any occurrences of a student's first name with "" to blank it out. However, this only worked when a teacher used the actual name of the student. Many kids go by nicknames, shortened versions of their names, first and middle names, etc. I'm sure there must be better ways than looking through things row by row, but that's what I ended up doing.

The Analysis
After the spreadsheets were all cleaned up and ready for church, I looked at some different options for doing the text analysis. I don't have any real experience with this, and while I looked as some fancy options like Overview, KH Coder, and Emosaic, I just didn't have the time to devote to digging into them right now. Instead, I used the WordCounter and SameDiff options over at

The WordCounter provided the basis for the word cloud you see in the picture at the top of this page. I used SameDiff to compare lists of comments for male and female students, for example.

There are also comparisons for students who receive special services (vs. those who don't), students eligible for free/reduced lunch (vs. those who aren't), and students of colour (vs. white).

I also used a couple of pivot tables in Excel to summarize and sort through the data—for example, the total number of comments per grade level or per student population.

The Build
Compared to the last few data stories we've built out in the hallway, this one is less complicated. There's a lot of paper and stickers, with some foam to help provide dimension to the word cloud.

I knew I wanted the background to be yellow...something bright for spring, but neutral enough that the black lettering could pop. We put the word cloud in the center of the board. It has the 50 most commonly used words. On the outside, we have the four pairs of lists with words that are only found in comments for students in a particular group. The list for our students who receive special services is particularly depressing.

But wait, there's more...

This is our first data story which uses two boards. On the second board, we have information for our students in secondary grades (6 - 12). There are two middle schools and two high schools. Teachers have a list of "canned" comments at each school that they can assign—two per class per grading period—as opposed to the freeform comments elementary teachers create. For these students, we did some simple counts of how many comments per student and then underneath those charts are lists of the most common and least common comments selected. On the right of the board, we have an area for people to leave comments for us.

This second board isn't as sexy as the one for elementary, but I'm still excited that we have represented something for every school and every K - 12 student (even if they received no comments).

Lessons Learned
This is one project where I would have loved to have rejected the null hypothesis: the idea that there isn't any difference between student groups. But even with this very basic analysis, I couldn't. Even though most of the text is pretty much the same across student groups at the elementary level, the bottom line is there are some differences in how we talk about boys and girls...and for students of colour...and students from low-income backgrounds...and those who receive special services.

We may never eliminate bias, but if we don't bring it to light, we can't start to address it. While it's great that our district is taking on several initiatives around inclusion and cultural competency, but these are useless if we only use them to pat ourselves on the back for starting them. If we can't change the system in meaningful ways for students, we are just as complicit as those who built the structures in the first place. This display is one way to raise some awareness of what we're up against.

To see more pictures of this project, or view frequency tables of the comments, please visit the page for this data story. As always, comments welcome!