Wednesday, May 30, 2012

Housekeeping

In a fit of spring cleaning, I've reorganized some content for this blog. If you're viewing this post via RSS, you won't notice a thing. For visitors to the site, you will notice some updates.

Pages

I have created three new pages, all with their own permalinks. Depending upon the content of future posts (stats? add-ins?), I may add more pages. I do plan to take advantage of the more dynamic nature of the pages to update content.

All of the posts about building and using a gradebook in Excel are now in one place. Sure, you can use the gradebook tag on any of those posts to see all of them, but the new page houses things chronologically and with some additional text to help guide users. I've also moved the various resources off the sidebar and built a page just for Books and Links.  Finally, I've deleted the blogger profile info from the sidebar and built an About page with a statement about the blog, my contact info, and some backstory.

Blogroll

I've refreshed the blogroll with some great new reads. I hope that you'll check them out:
• chartsnthings is the (personal) blog of data sketches from the New York Times graphics department. I love the metacognitive aspect of this blog---a peek into the thinking of designers as they build visualizations.
• "Data Remixed is a blog dedicated to exploring data and sharing insights in an engaging way."
• Visit Tableau's Viz of the Day for a variety of visuals built by users like you.

Subscriptions

If you have additional suggestions to make the site easier to navigate or links/tools/resources that should be included, let me know.

Sunday, May 27, 2012

Statistically Speaking with Excel: Basic Descriptive Stats

If you want to do some heavy statistical analysis, there are better programs than Excel for doing so. But, let's say that you aren't into #bigdata and just want to do some small scale divination. Excel can totally be your bff for that.

Educators are most familiar with descriptive statistics: the ways we describe a set of data. What is the shape of the distribution of values? Where is the "middle"? What can we say about the population of values and their relationship to one another?

I have pulled a new data set to play with...one with test scores for about 500 students. While you can do statistical analysis with your gradebook---and let's face it, most teachers do in terms of how they assign final grades for students---I am a little squeamish about doing so. There may be no magic number that applies to every sample size, but I want us to play with something we can have a little more confidence in than the basic gradebook I've posted here. You can download the test scores workbook here.

This is the basic set up:

I have divided the data set into two schools: A and B. Each student is identified by their first name, and in some cases, a last initial. There is a raw score for reading, writing, and math, as well as a level of performance (1, 2, 3, 4). The range of score points for each level plays out like this:

You may look at this and wonder about what appears to be missing points (Why can't anyone get a 399 on the reading or math test?!) and why writing has a different scale. There are reasons...good ones...but I'm not going to get into them at the moment. These data do come from (old) state test results: first names of students are unchanged, but I did fill in some missing scores. We're going to play along with the rules that were originally applied to determine the scores. So, do your best to overlook the oddities of these scales for now. For these tests, a score in Level 3 would mean a student can meet the standard (Level 4 = exceeds, Levels 1 and 2 = below standard).

Let's start with measures of central tendency (mean, median, and mode) for our reading data (C2: C517). For the mean, Excel uses the AVERAGE function...for median, oddly enough, we can use MEDIAN...and for the mode, it depends on which version of Excel you have. In olden versions, the MODE function worked just fine, but starting with 2010, you have choices. To just get "the" mode, use the MODE.SINGL function. To get multiple modes from an array of data, you can use the MODE.MULT function---a very handy improvement. Not every data set is bell-shaped. So, here is what we have for the reading scores. I placed the formulas in the table on the right so you can see the syntax.

What does this mean? Well, first of all, our mean, median, and mode are all about the same. It's not necessary for your measures of central tendency to agree. After all, each one is a different way to identify the "middle." It's up to you to determine which is most appropriate. However, in this example, no matter which one you choose, you'll be fine.

If you want to graph this data set, it's not so friendly in its current form. We'd be better off building a frequency table first. This will allow us to find out how many students are in each category, then create a graph to visualize the distribution. To keep things simple, let's find out how many students scored in each level for the reading test. Build a simple table first, then select the empty cells:

Now you're ready to add your formula---a single formula ("array formula") which will fill all the cells at once. Then belly on up to the formula bar (too bad you can't pull a beer from here) and start entering the formula:

The "data_array" will be the column of data (not including the header) with the data about reading levels. The "bins_array" refers to the cells that have the labels for the reading levels---in this example, the 1, 2, 3, 4 you see in to the left of the cells selected in the table above. The final formula looks like this (plus a close parenthesis at the end):

When you've finished entering this information, you need to use a command to fill in all of the cells in the table. Plain old ENTER will not work. You have to use CONTROL + SHIFT + ENTER. And poof! We now have a frequency table:

If this scares you, you can use a simple COUNT function in each cell, but hey, you're ready to use big kid functions. Give FREQUENCY a try. Use the filters in Excel and compare School A with School B.

This is a good spot to stop for today, but we'll come back to this data set another day to see other descriptive statistics in Excel...and then move on to inferential analysis.

Bonus Round
Did you know that Excel has a whole category of statistics functions? Get in there and play!

Monday, May 21, 2012

Out of the box, Excel is awesome all by itself. And yet there are those who endeavour to kick things up a notch by inventing add-ins...a kind of extreme macro. This specialized code enables Excel to do all sorts of new tricks. In previous posts, we've looked at Sparklines and ASAP Utilities.

I was recently pointed toward MapCite and made some time to give it a try. With MapCite, you can visualize your data using a Bing map embedded in Excel. The add-in looks something like this*:

 *See the Bonus Round at the bottom of this post for more information.

For my data set, I pulled from one we've seen here before, the 5th grade state science test results from 2011. I added district address information.

Now, we're ready to "Geocode Data." Select the data---not the columns, or MapCite will tell you it can't handle that much work---for the addresses. Then click the "Geocode Data" button.

In the pop-up window, fill in the information. Then, click "Geocode."

Holy cow! We've just generated a whole new set of data:

 There's more where that came from...
Now, you can add the mapping features. Select the data in the Latitude and Longitude columns, then click "Add Data."

In the pop-up window, indicate the required information and click "Finish."

Let's have a look at what we have wrought. Click on the "Show/Hide Map Pane" button. Here is our first view:

Not too exotic, but that's because MapCite automatically clusters the pins so things don't look messy.  Here's an unclustered look. Note that when you click on one of the pins, the row in the worksheet with the matching data is automatically highlighted. Keep in mind that you can also change the base map that is used.

You can also use the HeatMap feature to take a look:

You may be wondering why I bothered using science data when we just used the addresses of all the districts. It is a bit of a head-scratcher. But, with filtering tools in Excel, you can choose which groups to look at: small schools, those which scored above the state average, etc. You can also add GPX data (data that shows a route that was followed).

What I like about the add-in is that it's easy to use and that I can see the map in my spreadsheet. I don't have to upload my data elsewhere and pull it into another application. However, at this point, MapCite is fairly limited in features. You can make different pins for different pieces of a data set, but you can't show more than one set at a time. It's great to have things on a map, but I need to derive more meaning than just concentration. As such, classroom applications (other than what students might look at it) are limited. I think new features will come in time. I've been promised an upgrade to a Pro account when it's ready for release, and will let you know what else you can do.

Have a look around the MapCite website or YouTube channel for additional information. Better yet, give it a try for yourself.

Bonus Round
I say "like this," because I couldn't get the add-in to install properly. Although MapCite was very responsive to my inquiry for tech support, we couldn't figure out why Excel was making the add-in invisible. I ended up kludging things by creating a new tab for the ribbon called "MapCite," then dragging the groups from the tab-that-refused-to-show into my kludged one. Tech support said that they haven't had any issues similar to this one, so don't let my experience put you off. Also, if you have any clue why Excel shows the MapCite add-in as being active, but won't put it on the ribbon, I'd love to hear how to fix this.

Saturday, May 19, 2012

Give Me That Old Time Religion

We are not new acolytes in the service of charts and graphs. Some of us may worship at the altar of Excel, but that does not mean others have not had their own methodology and purpose.

It is surprising how many schoolmen avoid charts and graphs as though their use required calculus and the practice of the black art. Successful practice of school administration is no different from the practice of business; both require the constant habit of visualization. Many problems of school administration involve the clear and concise presentation of facts and relationships. The stump speaker to the contrary notwithstanding, facts do not speak for themselves. Relationships have to be made vivid to be understood. Unless facts and relationships are presented in a manner at once clear and interesting, they are as useless as locks without keys. Charts and graphs present facts and relationships more forcefully, in less time, and require no more space than the same facts presented in words.
---Bowman, E.L. 1921. Graphic aids in school administration, article I: visualizing organization facts. American School Board Journal, 63(12), 29 - 31.

Last month, I introduced you to E.L. Bowman---enthusiastic proponent of data visualization as watchtower for schools in the 1920's. I have two more of his articles to share. I like these little historical snapshots. Technology may have been a limiting factor in getting these ideas to take root, but there was still a lot of deep thinking happening about the potential of using data for educational purposes.

In the first article, Visualizing Organization Facts, Bowman extols the virtues of organizational charts like this one:

He also looks at flow charts for coursework and job responsibilities. Bowman includes rules for drawing and duplicating charts, while reminding us that "If [a superintendent] causes these plans to appear in the form of graphs, he makes easier the comprehension of his ideas by others." Mind you, he also likes the idea of using these charts to remind others who's the boss. (Not so different from today.) While I have rarely seen org charts used by schools and districts, I do think that visually presenting processes could be a very powerful tool for schools. What are the pathways for graduation? What interventions are available for students who need additional support for their learning? What are the next steps with behavioral issues? How could graphics help communicate with non-English speaking families about navigating school with their children?

The last article that I have (so far) is about Establishing Routine through Graphs. Here's my favourite quote from this one:
There is scarcely a state school system in the country ·that has not in the past called for unnecessary statistics in the compilation of its annual reports from subordinate districts. Much of the data thus demanded on pain of forfeiture of state subsidy was never used, but was embalmed somewhere in a bulky printed volume.

Remember, this was 80 years before NCLB. I'm telling you, the dude was ahead of his time. Anyway, in this article, Bowman focuses on the procedures schools use to make purchases, track time, and develop lessons.

In most cases, I feel like these visualizations are cumbersome and unnecessary. I'd hope that most of these "routines" are streamlined today, either due to better prepared workers or technological advances. I was amused at the end of the article to read Bowman describe the awesomeness that is having a calendar on your desk. Word.

If my forays into Google Books reveal any more of the long lost gospels of data viz in schools, I'll be sure to post them here. For now, go in peace.

Tuesday, May 15, 2012

Animated Graphs in PowerPoint

Charts and graphs are used to tell a story to an audience. And as with any good tale, you may want to parcel out one chapter at a time. There are lots of tools to help you get the job done, but today we're just going to look at moving your graphs into PowerPoint.

Last year, one of the presentations I was giving was about using cell phones in the classroom. At the beginning of the presentation, I was sharing some data about how students access the Internet. I had pulled data from two different Pew Internet studies. One had data comparing access to computers and cell phones, based on family income. The other had data reporting how teens used cell phones to access the Internet, also based on family income. There is a bit of a surprise in the data, so although I wanted to show the big picture, I wanted to portion it out so people would get the "A-ha!".

The easiest way to do this is to build three different slides: one for the first data set, one for the second, and one for both. But why do it the easy way? Instead, you can use some simple animation to reveal each piece of the puzzle.

Start by building the graph. I prefer to do this in Excel and then copy it into PowerPoint. Here is the full meal deal slide:

Now, I'm ready to add animation. At the beginning, I don't want the blue bars to show. The orange and green show what you might expect. As household income increases, so does access to a computer. Access to a cell phone is similar, but not as dramatic. To make the orange and green bars show first, select them and go to the animation tab. Then, select "By Series" under "Effect Options." I don't recommend fancy animations. Just because you can add spins and flashes doesn't mean you should. A simple fade is plenty.

This will make the first view people have of the slide look like this:

Then, "Add Animation" for the next series (the blue one) to appear...and for the orange series to disappear. Ah, now we can see something interesting: Students from lower income families use their phones the most for going online. (PS---don't you love how the green bars don't look like they're the same shade as the ones above? They're the exact same, but it's a good example of how colour is influenced by context.)

Finally, bring the orange series back for the final comparison.

If you want to see the finished product, you can use the embedded view below. The vagaries of SkyDrive mean that you will need to choose the full screen option to make the animations work. Once it opens, just use your right arrow or page down key to view the results.

Tuesday, May 8, 2012

It's What You Do With It

If I asked you how big your data set was, would you hold it against me?

Most days, I feel like what's happening in the world of data is a lot like biological evolution. Outside pressures can drive internal changes. I see a Cambrian-levels of explosive growth in data use and modeling today as we experiment with what outside tools can do to internal data sets. And "Big Data" is rapidly becoming the king beast on the block. Even education cannot escape this growth, with federally funded longitudinal data systems and the other ways districts and states are putting together their data.

But this blog is about small data. I want individual teachers and principals to be able to answer their own questions---without fancy tools, mind-bending formulas/programming, or worrisome charts. Small data has its challenges, however. Statistically, we are on shaky ground. We haven't kicked it around much here, but I do wonder about the validity of the instructional decisions we make based on an assessment of a class of 20 - 30 students. I realize that classroom-level data serves a very different purpose than #BigData. As teachers, we try to support each student. But what represents a reasonable "sample size" for making these determinations? What do validity and reliability look like within the microcosm of small data?

In spite of these (and other) questions, I believe that small data has a significant role within the data ecosystem. We just need to help more people see what that is. I worry that #BigData is on a path to be too large and in charge of what happens in a classroom between an individual teacher and a student. We need to remember that it isn't the size of the data set that matters, it's what you do with it.

Bonus Round
If you live in the Seattle area and want to talk about all sizes and shapes of data, I hope you will join the Seattle Data Visualization Group. The next meetup will be Tuesday, May 22, venue TBD (but likely close to the Tableau folks). Don't worry about the size of your data set---you'll find lots of friendly people who have interesting projects and ideas to share. There were only two of us ed folks there last time, and being able to share our challenges and applications made for great conversation. I'd love to see you there!

Sunday, May 6, 2012

Ain't Misbehavin'

In the last post, we looked at one way to display information about non-academic behaviors in the classroom---things like work ethic, participation, and attendance. These are student attributes that we value and want to report on, but need to be separated from grades and other measures of learning.

Even though I am sure that I will revisit these ideas and fine tune the data displays over time, I thought I would lift the curtain to reveal the workbook innards. No doubt some of you would like to play, too.

Before we jump into the guts, I want to say that all of the attributes and scales are just examples. We could have a lively discussion about which attributes to measure, how you (or your students) could collect these data, and what the cut scores should be in order to be in the range of performance we'd like to see. These are critical pieces to consider...ones that I hope you ponder and kick around with your colleagues and students. I am going to set them aside in this post and strictly focus on the nuts and bolts.

On the Report Sheet, I set up a data validation for the students last names. I started by selecting the range of last names on the Attendance/Grade worksheet. I named the range "LastName." Then, on the Report worksheet in the cell where I wanted the dropdown to appear, I chose Data Validation on the Data tab. In the dialogue box, I selected List from the "Allow" dropdown and then in the "Source" box, I added a formula using my named range. For the "First Name" area on the Report worksheet and the selections at the bottom of each of the other worksheets, I used our old friend, the INDEX/MATCH formula.

Let's move on to a few new tricks. Most teachers keep attendance records by marking who is absent---not who is present. But on my graph, I wanted to chart attendance as a positive measure. There are a few ways to do this. The simplest would be to change the way we keep attendance. Mark students when they're present and just count up the number of days at the end of the grading period. I didn't take the easy way out. Instead, I left each "A" for Absent just where it fell. To get the total number of days present, I used this formula:  =30-(COUNTIF(C6:AF6,"A")). In other words, count the A's in a particular row (in this example, row 6 for Cathy Andrews) and subtract that number from the number of available days in the grading period (30). However, this swap won't work for creating the graphs. I need numbers to work with---not text. But a simple IF statement makes short work of that: =IF(C18=0,1,0), where "C18" is the cell from the INDEX/MATCH result for a particular student.

I also used a nested IF to have Excel convert the number of days attended (or points earned) into a level of performance. For example, here is the one that goes with the worksheets for Work Ethic, Participation, and Behavior:  =IF(AG6<=59,1,IF(AG6<=79,2,IF(AG6<=99,3,IF(AG6>=100,4))))
It looks worse than it really is. (Don't most things?) Notice the repeating "AG6," which is the cell with the total number of points I want Excel to use as a comparison. In the first part of the statement, we tell Excel that if that number is less than or equal to 59, to put a 1 in the cell. If not, Excel moves on to the next part of the statement. Even though we know that "30" is less than or equal to all of the cutoffs in the equation, Excel considers only one at a time.

Shall we talk about the graphs? I used the stacked bar from the charts menu. No doubt some of you out there will want to use a regular bar chart. That is appropriate, too. I picked the stacked bar for two reasons. One was that I wanted to do more than show a comparison. I wanted to illustrate a proportion. A viewer can certainly make that inference with a regular bar chart, but it's a little less work with the stacked bar. My second reason was purely about the aesthetics. I knew I was going to want a line graph in each of the four areas (attendance, participation, work ethic, behavior). The vertical real estate claimed by a bar chart threw off the look of things.

Anyhoo, after you select your data for the graph, don't fall apart of it looks like this:

Simply hit the "Switch Row and Column" option and you will see the stacked bar. After that, you can clean up the graph.

I won't go into detail with the line graph. What I will share is that it does take time to create the labels, standardize the sizes, and get everything in place. Take your time. Don't assume that Excel knows best. It's important to make things look good---otherwise, you are missing your chance to truly communicate with the data.

The radar/star/spider chart is found in the "Other Charts" menu. Again, this chart is not everyone's cup of tea. One of the ed research articles I read mentioned using the chart with college students as a form of feedback---and they hated it. This seemed to be due to the lack of familiarity with reading these types of graphs. We see line graphs and bar charts all the time. Radar charts? Not so much. There are other reasons not to choose this type of chart. For example, it works better with cyclical data. And, the area of the graph becomes distorted with the scale. For every point increase of the scale, the size of the area becomes squared.

I would suggest that if you select this type of chart to use with students (or parents) that you spend some time talking about how to read them. Show them how to look at the connections between the spokes and the area covered by the graph. Why did I choose it, given the difficulty in interpretation? Because the attributes of student performance are not simply about comparing them. There needs to be a broader view of what is happening in a classroom---how is the student performing in light of the overall goals...and more importantly, how might non-academic factors be affecting one another (as well as student learning)? I think that you lose some of these capabilities with a clustered bar graph.

What would you do differently with this workbook and these graphs? Would you choose to put the line graphs all on one chart for easier comparison? Do you like the idea of bar charts better than a stacked bar...and clustered columns instead of the radar graph? How would you organize the workbook?

Thursday, May 3, 2012

Oh, Behave!

There are two broad categories of grading practices in the great wide world these days. One is termed "hodgepodge," referring to the mixing of student learning and behaviors (e.g., work ethic, attendance...) into a single score. The standards-based version of grading does not crunch scores for learning with student behaviors. There are variations within each category, of course. Even though this blog has modeled various ways to look at student data within a standards-based context, we've never considered options for visualizing student behavior. Time to change that.

I've built a model workbook that tracks four types of non-academic factors: attendance, work ethic, participation, and behavior. Each of the worksheets looks a lot like the gradebook. Names are on the left, but there is a series of dates across the top. There are six weeks (30 days) in the model.

 Yeah, Baby

And, there's a reporting page which pulls information from each of the four worksheets. A data-validation dropdown menu for the "Last Name" automagically updates the sheet for each student. Here is what I'm thinking about for the display:

The top graph shows proportions. Attendance displays the number of days, the others the ratio of points earned. For the purposes of this tool, I've assumed that a four-point scale is available for each factor (work ethic, participation, behavior) each day. No points were assigned on days when the students were absent, as it is not possible to observe student behaviors when you aren't in the same room. The line graph below each stacked bar provides a sense of the 30-day trend. Gaps in the line represent days when the student was absent. Whatever was happening with Lucy here, looks like she had a real low point about mid-way through the grading period, but was starting to get back on track with her work ethic and participation by the end.

But I thought it might be interesting to consider the interaction between these factors...and a student's grade. So, the bottom of the report has a radar/spider/star chart. Most people are not fans of this type of graph. I'm a bit ambivalent about them, too. However, student achievement is not just about comparing categories, it's about looking for the confluence of all parts of learning to make decisions about instruction. To make the chart, I converted the total number of points earned in each category into a 4-point scale. I made up a summary grade.

So, here is Lucy's chart:

 Lucy Van Pelt

Hmm. Lucy seems to have pretty good attendance and work ethic...but her grade isn't reflecting those strengths. Is her behavior in class keeping her from learning?

What about Dick Tracy, who appears to be living up to his name? Ahem.

 Your name wouldn't happen to be "Dick," would it?

This looks like a student who could make a teacher pull his/her hair out. They show up nearly every day and act like a terror when they're there. What do we do about a problem like Dick Tracy? How do we channel his energy into something more productive?

For a final example, here's Flash Gordon.

 Flash, ah-ah!

Good attendance, behavior, and work ethic...participation isn't great, but is happening. What's the deal with his grade?

In the next post, I'll lift the curtain and talk about some of formulas behind this beast, but if you're ready to play, you can download the workbook here. Let me know what you think about the graphs. Too much or too little information? Other ideas for the display?

Bonus Round
I've tried to give each student a story with the data. If you're looking for something to start some PLC conversations, you can also use the workbook that way.