Statistical Analysis



1. Statistical Methods for Evaluation Items and Examinations

A number of computer grading systems give, or can be modified to give, useful statistics on student performance on the individual items of an exam. Item difficulty is the fraction of students answering a given question who give the wrong answer.

Almost all computer-based grading systems will tell you how many students got an item wrong. Other useful statistics are harder to get. A brief overview of other examination statistics follows.

A response list tells which distractors proved most misleading.

The discrimination of an item is calculated using the equation:

   (H - L)
Discrimination =  --------------------- 
   0.275 N



where H and L are the number of correct responses from the highest and lowest 27.5% of the students respectively, and N is the total number of students. This measure correlates student performance on a specific item with student performance on the exam. You can also use correlation coefficients for this; these require more extensive calculations (3-4).


2. Evaluating the Statistics

Interpreting the apparently straightforward difficulty factor can be complex. Very high difficulty factors are common, both in my own files and on questions submitted for standard examinations. Questions with low difficulty factors (easy questions) can be difficult to construct.

This area is a fruitful one for the application of the findings of Johnstone and co-workers (3,4), who have found that item difficulty increases sharply when more than about five pieces of information are required in the working memory to obtain the answer. Progress in solving more difficult problems comes when several operations are perceived as a single one. A faculty member, then, may perceive a question as requiring four operations (an easy question), while a student may perceive the same problem as requiring ten operations (virtually impossible). Many item writers measure their students using their own capabilities as a yardstick, but the writers have had years of practice to increase their skills. The important lesson is: Don't try to be too clever when writing questions.

The discrimination factor also has complexities of interpretation (5). For a constructor ofstandard examinations, the higher the discrimination factor, the better the question. For an instructor, low discrimination can signal things other than low item quality, although the interpretation "it's a bad question" is frequently valid.

 

Go to the | Objective Testing | John P. Sevenair | Xavier University | Home Page