Inter-Rater Reliability Series – Point-by-Point Agreement

I promise I will go back to my series on open-coding, but after the webinar I gave last week I realized there was a big interest in using Studiocode for Inter-rater reliability (IRR). There are several methods for calculating IRR, and the method you choose is dependent on your data type. In this first video I will give detailed instructions on how to calculate IRR for frequency-count data using point-by-point agreement. The video I am coding is another example of a teacher using Discrete trial training with a student. I coded the video for “correct responses” by the student. If this were a research study, and I were drawing conclusions about how this strategy improves a student’s accuracy, I would need to make sure I was accurately coding the student responses by using a second coder as a reliability checker. In this video I explain how I could do this with Studiocode.


Posted on March 24, 2014, in IRR. Bookmark the permalink. 3 Comments.

  1. Tara…excellent. I think the IRR is an interesting area. Look forward to seeing the next video on duration IRR.

  2. Great post Tara. This is an excellent example of what many are trying to accomplish with IRR in Studiocode. Many of our clients in your example would take a slightly different approach. Since the behavior in question is “response” they would use a single code button to mark each response. Then each rater would independently rate the row by assigning labels to identify current or incorrect responses. The IRR then would simply look at label agreement on the instances.

    Got any clever ideas for scripting in your scenario when there are more than two rows in the overlap?

    • Thanks for a great example, Will. You are correct…most people would probably use a label to give more information about the response type (kind of like I do with my OTRs hierarchy), but I was trying to start simple 🙂 I was going to save adding text labels for the Kappa calculation of IRR because after you have agreed-upon responses, you can calculate IRR using Kappa that involves how you labeled the response (correct, incorrect, partially correct, etc), which also takes into account the likeliness of labeling the same way by chance. I haven’t tried calculating Cohen’s Kappa yet with Studiocode, but this is one of my next steps.

      The only thing I have tried for three rows (like if I had 3 coders) would be to combine rows for rater 1 and 2 with the “and” command, so I only see their overlapping instances. Then I could use the same process for IRR as I described in this video with that new row and rater 3 🙂 If you think this is something people would be interested in seeing, I can certainly make a video for this!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: