More on IRR

I apologize for the long gaps between posts these past few weeks.  My new semester starts today and I had to finish up an online course, so my time to spend on my research/this blog was cut short.  I am ready to really dive back in now though 🙂

First off, I would like to address some of the wonderful comments on my last post:

1)   Mike – Thank you so much for the keyboard shortcut tips! I’ve been using a Mac for the past 8 years, and I still don’t have them all down.  😉

2)   Ryan – Thank you for the solution for Row A or Row B or Row C – This will work perfectly.  As you all have shown me, there are multiple ways to accomplish this.  I’m sure there will be times where one method is easier/better than another.

3)   Owen – You asked if I needed to see the row for Independent Working, and I’m not sure yet.  If I wind up using any text labels that attach to the Independent working code, then yes, I will need it for scripting.  As it is used now, I shouldn’t need it.  I’ve been slowly getting the hang of the scripting and the ability to use mathematics to calculate values in the scripts.

4)   Phillip – You raise an excellent point about some other ways to calculate IRR with more subjective rating systems.  In terms of measuring independence, I don’t need to calculate a magnitude of independence, but there are definitely some behavioral observations for which magnitude would be a very appropriate measurement.

Here is an example where both frequency, magnitude, and duration would need to be coded and evaluated for IRR:

If I were coding a student for aggressive behavior, I would want my raters to look both at frequency, magnitude, and duration of the behavior.  I would want to see if they marked the same incidents as “aggression” (the frequency IRR), I would want to compare the lengths of the incidents, and I would want to have operational definitions for different magnitudes of aggression (common magnitude measurements for aggression are: mild, moderate, and severe). I think I would use text labels for the magnitude categories so each incident would be labeled with the level.

For the duration, I would calculate IRR in the same way I calculated it in my last post.: (A & B)/(A or B) x 100.   For frequency and magnitude it gets a little trickier.  I can easily look at the timelines and compare incident by incident to determine if the same incidents were coded and if the magnitude levels match, but this doesn’t save me much time because I still have to analyze each list.  If I used that method, StudioCode doesn’t actually do anything different for me than a simple matrix or a checklist would.  Does anyone have any ideas for ways to use scripting to report out IRR for those measurements?  I’m going to think about this one and see what I come up with.  Right now I can easily figure out a way to see if their frequency counts are equal in number, but that wouldn’t necessarily tell me if my raters were counting the same incidents.  One method in traditional IRR calculation is to divide the period into intervals and simply mark if the behavior did or didn’t occur in an interval, but StudioCode allows us to be so much more accurate than using general intervals.  Also, I would be able to code leading events to the aggression to look for patterns in the function of the behavior (why the student was acting aggressively).  Any thoughts on some easy ways to report some of this data with scripting?


Posted on August 26, 2013, in Uncategorized. Bookmark the permalink. 1 Comment.

  1. Tara,

    Excellent thoughts here on IRR. The idea of frequency within a period would be easy to accomplish in STC clearly. But accessing a more accurate measurement is desirable. One way to look duration IRR is to define an acceptable difference in mark-in and mark-out points. So for instance, lets say the duration of the event is 20 min, the number of instances is 10, and the avg instance duration is 30 sec. In this case we might look at a difference in mark-in/out of 3 secs (10% of total instance duration). This helps establish which instances are in agreement.

    Another IRR measurement I’ve been associated with involved the agreement of text labeling. The raters were each provided with a coded STC package. And they had a code window that only contained text labels. Then using label mode they applied labels to the already coded instances. Obviously, the label structure can be as complicated as you need. But the IRR measurement is only applied to label agreement.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: