Originally posted March 8, 2019 by Kristie Dukewich


I recently began my workshop, Assessments 101: More Than Multiple Choice, with the caveat that  I don’t hate multiple choice questions (MCQ). For exams I still regularly use MCQ in addition to a variety of other formats.

I think MCQs get a bad rap. They’re a tool, and like all tools there is a skill to using them effectively. There are strategies for writing good MCQ items, and there are ways to evaluate whether your items are effective. Using these strategies can make you vastly more comfortable with using – and defending the use of – MCQ on your exams.

Writing Good Stems 

1. To get at application and analysis, use vignette-style questions.

  • Describe an event, happening, circumstance, or other scenario
  • incorporate material being tested in a new way

2. Have students interpret novel material that is referenced over several questions.

  • Present map, reading, or data set that students haven’t seen before that requires interpretation

3. Consider principles of Universal Design for Learning (UDL) to write questions that are  accessible to all learners.

  • Avoid idioms and low frequency words ex. I’ve been asked to define ‘ambiguous’, ‘hooves’, and ‘kettle’ on exams – words unrelated to course content
  • Use plain language and simple, straightforward grammar
  • Tell your students to ask if they’re confused about question wording or don’t know the definition of something (but if it’s course related you may decline to answer)

Writing Good Distractors

1. Write statements that are true but don’t answer the question.

  • “Gravity is the force of attraction between two masses” is true, but it doesn’t answer the question, “Which Law of Thermodynamics explains [the scenario]?”

2. Write statements that might seem right to the student but are incorrect.

  • Each incorrect distractor should be plausible

3. If you use a pair of concepts in the options, make it two pairs

  • When you pair answers like covert and overt, or meiosis and mitosis it’s usually one of the pairs, so try to ‘pair pairs’
  • eg. (a) Overt (b) Covert (c) Endogenous (d) Exogenous

4. Try to make all options the same length.

  • The correct answer is often longer because it requires more explanation

5. Avoid using “All of the above”, “None of the above”

  • Students are more likely to get these questions correct because these options are disproportionately the right answer
  • if you insist on using these options, make sure they are only correct 1/4 times that you use them

6. Avoid using “A, B and D” style options

  • These options are often targeting a different construct, such as working memory and executive functioning
  • These options require students to compare 3-5 statements simultaneous, but people tend to max-out at keeping track of 4 pieces of information at once

Overall Design of MCQ Section

1. Align exam questions with course learning outcomes

  • Be cognizant of how the questions align with the course learning outcomes to ensure that you’re adequately assessing different outcomes in your exam
  • Some learning outcomes will not be assessable with MCQ!
  • And Some learning outcomes are not  appropriate for exams at all e.x. “Find and summary research from the primary literature” is probably better assessed with a research assignment

2. Make the distribution of answers even among As, Bs, Cs, and Ds

  • Instructors tend to want to “hide” their answers in Bs and Cs
  • students can score above chance, even if guessing

3. Use 3 – 4 options total

  • More options do not improve discrimination for questions
  • if you have 5 options, there is almost definitely a distractor that few students select – just get rid of it, it will not change your typical average for the exam1

Rodriguez (2005) recommends 3 options. When I went from 5 options to 4 options, my exam averages did not change.

Use Item Analysis to Improve Exam Items

Item analysis is a statistical procedure for evaluating the validity and reliability of MCQ exams. These statistics are often available through the scanning technology available to instructors, and they are also available via Moodle for instructors that use the quiz function.

Question Difficulty is the proportion of students that got the question correct – the harder the question, the fewer students will get the question correct. The optimal difficulty is p(correct) 0.5 – 0.75. I start to look hard at questions with p(correct) < .5, and I am highly likely to eliminate questions with p(correct) < 0.3. Keep in mind that if students are guessing, p(correct) should be about 0.25.

The Discrimination Index (DI) is the correlation for an exam item with overall exam performance. The notion is that good questions should be more likely to be answered correctly by strong students compared to weak students, so a good question should positively correlate with overall exam performance.  A DI of 0.1 is ok, but above 0.3 is pretty good. A negative DI tends to indicate that there is something misleading about a question such that strong students are over-interpreting the question, while weak students are taking it at face value.

TABLE: Guides for Interpreting Question Difficulty & Discrimination Index Together

 

LOW DIFFICULTY

p(correct) > .8

MEDIUM DIFFICULTY

p(correct) from 0.5 – 0.8

HIGH DIFFICULTY

p(correct) < 0.4

LOW DI

0 – 0.15

  • Consider eliminating question on the next iteration – it’s probably inflating the grades
  • Consider question carefully – is it too focussed on recall?
  • Consider eliminating question
  • Indicates a poorly-written question or unclear teaching of the concept

HIGH DI

0.15 – 0.5+

  • Keep, but consider ways of revising to increase cognitive complexity
  • Keep! The ideal question will have a p(correct) = 0.7 & DI > .25+
  • Consider revising
  • If the DI is high enough, it’s probably worth keeping but try to lower the difficulty by revising wording or distractors

Item analysis reports often also include a Reliability Coefficient. The reliability coefficient is basically a correlation coefficient that indicates how well all of the items correlate with each other. The higher the reliability coefficient, the more likely it is that your exam is targeting a single construct (i.e. knowledge of a particular discipline). The reliability coefficient should be at least 0.6, above 0.7 is good for a classroom test. Standardized tests like the SATs have a reliability coefficient of around 0.9.

A Final Note on MCQ Exam Sections

UDL encourages multiple means of expression. There is not one magic exam question format or assignment design that will allow all students to perfectly demonstrate their level of achievement with course learning outcomes. Variability and options allow students to find a method of expression that effectively demonstrates their level of competence. I would encourage you to give your students who struggle with MCQ another avenue to demonstrate competence on exams. Usually my MCQ strongly correlates with the other sections of my exam, but occasionally I have a student who suffers on one type of question format or another. I think it’s important that those students have the opportunity to earn a grade that better reflects their competence with the course material, rather than their competence with MCQ.