The following is a guest post from the mastermind of Comparative Judgement, Dr Chris Wheadon.
The marking of English Language is likely to be extremely challenging this year. English Language has long form answer questions, typically with 8, 16 and 24 mark responses. Ofqual’s research suggests the following range of precision is normal across GCSE and A level:
8 mark items: +/- 3 marks
16 mark items: +/- 4 marks
24 mark items: +/- 6 marks
So, when an 8 mark item is marked, for the same response, it is normal for one marker to give 4 marks, while another will give 7 marks. So, referring to the AQA mark scheme for English Language, one marker will mark a response as a ‘Perceptive Summary’ (Level 4) while another will mark it as ‘Some attempts at summary’ (Level 2). Ofqual’s research show that this difference in opinion is a normal occurrence in marking of 8 mark items.
When a 16 mark item is marked, it is normal for one marker to give 12 marks, while another will give 16 marks, the difference between ‘Clear, relevant’ and ‘Some attempts’.
When a 24 mark item is marked it is normal for one marker to give 9 marks, and another to give 15 marks which is the difference between ‘Simple, limited’ and ‘Some success’.
To be clear, Ofqual’s research is based on the differences that are normal for established specifications, after marker standardisation, and with the use by exam boards of sophisticated statistical rules which stop poor marking as soon as it is detected.
Working in isolation, without access to the range of work being produced nationally, the marking in your school is likely to be considerably worse than these levels of precision. Further, when scripts are marked by exam boards the responses from each candidate are distributed to different markers. The distribution process ensures that severe marking by one examiner of one question is usually cancelled out by generous marking by another.
The conclusion is that unless you have sophisticated systems for distributing responses you won’t even be rank ordering your scripts correctly, let alone aligning your marking with the published mark schemes. And even the awarding bodies don’t suggest that you can go from marking to giving grades at this early stage.
So, what should you do? Not get students to sit mocks at all?
Mocks are obviously useful as examination practice, but it likely the most use comes from the sitting of them rather than the marking and grading that occurs afterwards. However, without the incentive of marks, some of this purpose may be lost.
There is an alternative…
The critical part of an exam is that candidates should be measured fairly in comparison with each other. Comparative Judgement allows you to distribute each candidate’s response amongst all your teachers, so any individual bias is cancelled out. At the end of a judging session your pupils will be measured fairly against each other. You can give them a mark from 0 to 40 if you like against this measure, or a level or grade if you want to take a punt!
And the feedback?
Presumably this is something you give students regularly already? Rather than wasting time on individual feedback on exam performance, use exemplars from the exams, annotate them and work on them in class.
But there’s a cost…
Yes, 25p per script if you use nomoremarking.com and take advantage of the bar coded answer sheets. For a cohort of 100 students this would come to grand total of £25. There is probably no better way to spend your departmental budget.
There’s also a saving…
Teachers estimate that the judging process represents a 75 per cent time saving compared to marking, and that includes admin. So, let’s say you normally take 5 hours across 10 teachers to mark your mocks. That is 50 hours. With Comparative Judgement, you are likely to take around 12.5 hours. And once you get the hang of the process you’ll be going it much more quickly than that! Even the worst case scenario will result in a week’s worth of time back for an English department.
If you’d like to get started judging your English mocks there is a step by step guide over on nomoremarking.com
Give it a try. Get a week back and do something more useful with it than marking.
While I’m definitely intrigued by No More Marking (and considering it for writing for lower down the school), I’m not sold on it in this instance. The new GCSE criteria are frustratingly nit-picky in their specificity (e.g. mention technical terms, comment on both language AND structure – despite uncertainty over what counts as structure in their eyes – DO mention context in this question, but DON’T mention context in this one, etc.).
As such, I can’t help but feel I’d be doing my students a disservice if the grades they received weren’t attached to those specific AOs, and so transparent to the students (both in the sense of this why they got the mark they got, and this is what was preventing them getting more – i.e. next steps).
I can well believe that the judgments from NMM would be more accurate, I just suspect that for a mock examination they would also be less useful.
How would they be less useful if they were more accurate? Surely you’d judge the answers based on your knowledge of the AOs addressed by each question?
What I mean is that the judgment will be more accurate, but that judgment has less bearing on how the piece would score using the AOs. i.e. piece X is better than piece Y in analysis, but X forgot to use technical terminology, so would actually score lower using the AO.
I’m just not seeing how this would help the students, the teachers, or me (as HoD) get an accurate snapshot of where the students are in relation to the AO levels, why, and what they need to do to improve.
The only thing I could see in the step-by-step guide was applying the grade-distribution curve, to the ranked papers, but that seems even less accurate than moderated marking and, crucially, not tied to the AOs. So student X may present as a 6, say, because they are a better candidate, but their marks are perhaps only level 3 for the AO.
Don’t get me wrong, I’d love to go this way, and I’m wracking my brains to think of ways to make it useful; I just can’t see it in this instance.
I know what you’re saying Andy, Im just disagreeing with you. If you judge pieces against the AO criteria all you’ll get is a more efficient and reliable mark arrived at more slowly, less reliably but using the same criteria as examiners.
This is really interesting. As a parent of a year 11 who has just done a mock in English Language I would be interested to clarify whether the +/- 3 marks means that the range of marks from the lowest to highest marker would be expected to be 3, or that if one marker marks as 4/8 another might mark at 7/8 and another at 1/8?
So there’s a possible 23 mark range of error which is less than the difference in grades between A* and D on some papers? Makes me think school results should come with margins of error attached.
Scary, isn’t it?