# Unintended Consequences of a 0 – 100 Grading System

If a student makes four errors in the course of answering ten questions, what is an appropriate grade? Presumably, it would depend on the severity of the errors and the nature of the questions. Consider how your approach to grading might vary if students had been asked to:

– match ten vocabulary words to a word bank, or
– define each of ten words, then use each appropriately in a sentence

– complete ten 2-digit multiplication problems, or
– solve ten multi-step algebra problems, each requiring a unique sequence of steps

– answer ten questions similar to what they have seen for homework or in class, or

Would you label each answer as right or wrong, then use percentage right as the grade?
Would you assign a number of points to each answer (if so, out of how many points per question)?
Would you assign a letter grade to each answer (whole letters only, or with +/-)?
What would you consider a “D” set of answers?
What would you consider an “A” set of answers?

Would your answers vary depending on whether you had created the assessment yourself, or were using someone else’s questions?

Many math/science teachers seem to use a percentage approach (based on total points earned or number correct) more often than any other, particularly when their school defines its letter grades using a 0 – 100 scale. Teachers of other subjects also use this scale often, but less so for “free-response” questions. While a percentage approach can work well for some assessments, it can have unintended consequences for others.

### Similar Right/Wrong Questions

When asking a series of similar questions, such as recall of recently discussed vocabulary words, or completion of questions like those on homework, a percentage correct grading scheme can be reasonable. When all questions are similar in nature, equally difficult, and each answer will be either right or wrong, if a student completes 60% (or 70%, depending on the school) of the questions correctly, they receive a “passing” letter grade; if they do better than that, their grade rises accordingly.

The premise behind this approach seems to be that correct answers to “a majority of” material justifies a passing grade, and the more correct answers, the higher the grade. I perceive this premise to apply best when partial credit cannot be awarded (students are not required to show work or justify their answer), and all questions address core material that students have worked with extensively. Under such circumstances, a student’s percentage of correct answers may be a reasonably fair way to grade their work.

This approach can also be appealing to teachers because it can be graded quickly and provides a summative overview of a student’s mastery of the material. So far, so good. But what happens when assessment questions do not meet the criteria above?

### Including a Challenging Question

Assessments often include one or more questions that are more challenging than the others. Reasons for doing so include:
– as a “bonus” question
– to identify which students work the fastest or most efficiently
– to see who understands the conceptual as well as the rote/procedural

In such cases, it is tempting to assign more points to the more challenging question(s) either because they require more time or are more difficult. Suppose an assessment contains nine questions worth one point, and one question worth three points. Consider what happens when a student answers all questions except the three point one correctly: they receive a score of 9/12 = 75%, which could equate to a C or a D depending on the grading scheme.

If the challenging question was not similar to what had been thoroughly practiced in class and homework,  and students were not expecting a “new” type of question, such a grade will probably be perceived as unfair. On the other hand, if the type of question was thoroughly familiar to students, it might be less of an issue… but does it seem fair to assign a grade of C for missing one question out of ten?

If the challenging question had only been worth one point, like the other questions, then missing it would have dropped a student’s grade to a 90% (an A- or B+), which might be perceived as less unfair, even if the question type was new to students.

If the first nine questions had been worth two points each, and the challenging question had been worth only one point, missing the challenging question would have resulted in a score of 18/19 or 95%. Of all the options described above, this would probably be perceived as most fair by students, particularly if they are not used to responding to challenging questions on assessments. This would a great approach if the purpose of the question was to distinguish between A and A+ students.

The intent of a question, combined with the experience base of the students in the class, should determine how much it affects the overall grade. If fairness is a potential issue for an unfamiliar, harder, or more time consuming questions, consider under-weighting them relative to other questions.

### Multi-Topic Assessments

If an assessment includes questions on two topics, how should it be graded if a student answers most questions for one topic correctly, but has a number of incorrect answers for the other?

Should students receive two grades on the assessment, one for each topic? If the assessment was intended to be formative in nature, probably. If it was intended to be summative, perhaps not.

If each topic were graded separately, and the student received grades of 80 (B-) and 40 (F) on the two topics, what should the overall grade be? Would averaging the two grades be appropriate ((80+40)/2 = 60 = D-)?

What if the student’s multiple incorrect answers on one topic were all due to their faulty recollection of one formula? Is that one formula so significant that it is worth all those points? Has it been used in class and on homework extensively before the assessment? If not, students are heavily penalized for faulty recall of one item, which is very demoralizing to any student, and is likely to be perceived as unfair. While there are approaches to grading which can mitigate this problem, they require more time when grading. I advocate not including “double jeopardy” situations on summative assessments at all, and instead trying to have each question focus on a different skill, concept, or process.

### Complex Questions

I define complex questions as those which can be graded as partially correct. Points may be deducted for each missing element or error, or points may be added for each element that is present and/or correct, or a rubric may be used to determine the grade.

Returning to the initial ten question example above, if every question were eligible for partial credit, should each question be worth two points? That would allow one point to be deducted for a mistake, while still giving partial credit. Or perhaps each question should be worth four points… that would allow for more deductions before a student would receive a zero on the question.

Both options suffer from the same flaw, which also arose when averaging the two grades of 80 and 40 above.  If a student’s answer loses two points out of four, they receive 50% of the possible points on that problem.  50% is not merely a failing grade on that question, it is 10% (20% in some schools) below a failing grade. If a student can receive a score that equates to a grade which is less than an “F” on a question, responses receiving such scores will not just lower their overall grade… they will impose a penalty on the overall grade that is more harsh than receiving an “F” would have been.

### What Is Fair?

With a large enough (ten or more) list of similar right/wrong questions on well rehearsed topics, a percentage correct approach can work.

With complex or multi-topic assessments, the fairest approach seems to be: assign a letter grade (or its numerical equivalent) to each answer, then calculate an average (weighted, if so desired) of the letter grades to arrive at an overall grade for the assessment.  Ideally each question is graded using a rubric that students are familiar with from frequent use. Such an approach should be perceived as fair since the lowest grade possible on a question is never below an “F”.

One of the many possible grading rubrics for answers in a traditional Algebra class could look something like the following:

 Answer Attributes Grade Valid approach All work shown Correct Formulas Used All work is correct Efficient solution A+ All work is correct Inefficient Solution A One algebra, arithmetic, or sign error B+ Two algebra, arithmetic, or sign errors B- More errors than above C Valid approach Most work shown All work is correct A- One algebra, arithmetic or sign error B Two algebra, arithmetic, or sign errors or Formula Recall Error C+ More errors than above C- Seemingly valid approach Some work shown Correct Answer C+ Formula Recall Error C- Other errors D Invalid Approach or Little or no work shown F

If using a rubric on each question is not appealing, consider using total scores but “grading on a curve” using one of the approaches described in “How to curve an exam and assign grades“. If you like to have a linear relationship between scores and grades, the following approach can work:

1) Assign points to all questions as usual

2) Decide what the minimum number of points needed to receive an A+ should be

3) Decide what the maximum number of points that should receive an “F” should be

4) Use the following formula to map this plan to your school’s grading scheme: $Grade=59+\dfrac{ActualScore-FailingScore}{BestPossibleScore-FailingScore}\cdot(100-59)$

If your school considers an “F” to be a 69, use that value instead of the two values of 59 in the formula above.

4) All formula results below your value for an “F” should receive an “F”, and all formula results over 100 should receive an “A+”.

If you wish, you can create a spreadsheet using the above formula for each cell in the “Grade” column shown below. The spreadsheet version of the above formula, as it appears in cell B6 to the right of the first score below, would be:

=ROUND( B$3+(100-B$3)*(A6-B$2)/(B$1-B\$2) , 0)

This formula can be copied into the cells below “Grade” in a spreadsheet to calculate grades for as many scores as necessary:

 A+ Min. Score 20 Failing Score 15 “F” = 59 empty line Actual Score Grade 15 59 16 67 17 75 18 84 19 92 20 100

Note that scores of 17 and 18  have a 9 point grade spread, while all other one point score differences result in an 8 point grade spread. That is due to the rounding of grade values, so you can either include a decimal place in your grade values, or manually adjust the above grade values to compensate for where rounding might raise questions .

I find it very helpful to look at the Grades in such a table and ask myself questions like:
– Should losing one point on any question result in a grade of 92?
– Should losing four points result in a grade of 67?
If either of those grades seem “off” for any reason, I re-visit my scoring for the assessment, my scaling of scores, and/or perhaps even the assessment as a whole.

### Virtues of the 4.0 System

It is interesting to compare descriptions of 0 – 100 grading scales to typical descriptions of the 4.0 scale:

• Most descriptions of 0 – 100 scales provide numerical ranges (e.g.  A+ = 96 – 100, A = 93 – 95, etc.) for every letter grade to make things unambiguous when converting a percentage score into a letter grade.
•  Most (but not all) descriptions of 4.0 scales that I have found do not provide numerical ranges for each letter grade; they only provide a single numerical value for each grade (e.g. A = 4.0, A- = 3.7, etc.), which makes it challenging to convert a numerical grade into a letter grade in a way that is consistent with other teachers’ practice.

By only specifying a single number that each letter grade should be converted to when averaging grades, and not making it easy to convert averages back into letter grades without ambiguity, the typical description of the 4.0 scale discourages using a percentage grading approach.

Under the 4.0 system, overall grades/scores are reported on the 4.0 scale – not as letter grades. Furthermore, the 4.0 scale avoids the distortions introduced by scores below an “F”: an “F” is 0, a “D” is 1, a “C” is 2, etc. Scores below an an “F” are not possible unless you are willing and wish to assign negative scores.

### Summary

Based on the reasoning above, I advocate:

• With a large enough (ten or more) list of similar right/wrong questions on well rehearsed topics, a percentage correct approach seems fair.
• For most other assessments, if individual topics are not being graded separately (as might be done using Standards Based Grading), consider using either letter grades or their 4.0 grading scale equivalents for each question, then calculating the assessment grade using either an average or a weighted average of the question grades.
• Review every assessment for “double jeopardy” situations before using it.

You may also be interested in a related post: Short Assessment Grading: Add or Average? 